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Abstract 

This  paper  introduces  a  class  of  graphical  independence  models 
that  is  closed  under  marginalization  and  conditioning  but  that  con¬ 
tains  all  DAG  independence  models.  This  class  of  graphs,  called  maxi¬ 
mal  ancestral  graphs,  has  two  attractive  features:  there  is  at  most  one 
edge  between  each  pair  of  vertices;  every  missing  edge  corresponds  to 
an  independence  relation.  These  features  lead  to  a  simple  parametriza- 
tion  of  the  corresponding  set  of  distributions  in  the  Gaussian  case. 
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1  Introduction 

The  purpose  of  this  paper  is  to  develop  a  class  of  graphical  Markov  mod¬ 
els  that  is  closed  under  marginalizing  and  conditioning,  and  to  describe  a 
parametrization  of  this  class  in  the  Gaussian  case. 
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A  graphical  Markov  model  uses  a  graph,  consisting  of  vertices  and  edges  to 
represent  conditional  independence  relations  holding  among  a  set  of  variables 
(Lauritzen.  1979.  Darroch  et  ah,  1980).  Three  basic  classes  of  graphs  have 
been  used:  undirected  graphs  (UGs),  directed  acyclic  graphs  (DAGs),  and 
chain  graphs  which  are  a  generalization  of  the  first  two.  (See  Lauritzen,  1996: 
Whittaker,  1990;  Edwards,  1995.) 

The  associated  statistical  models  have  many  desirable  properties:  they 
are  identified;  the  models  are  curved  exponential  families,  with  a  well-defined 
dimension;  methods  for  fitting  these  models  exist;  unique  maximum  likeli¬ 
hood  estimates  exist. 

All  of  these  properties  are  common  to  classes  of  models  based  on  DAGs 
and  UGs.  However,  as  we  will  now  describe  there  is  a  fundamental  difference 
between  these  two  classes. 

Markov  models  based  on  UGs  are  closed  under  marginalization  in  the 
following  sense:  if  an  undirected  graph  represents  the  conditional  indepen¬ 
dencies  holding  in  a  distribution  then  there  is  an  undirected  graph  that  rep¬ 
resents  the  conditional  independencies  holding  in  any  marginal  of  the  distri¬ 
bution.  For  example  consider  the  graph  U\  in  Figure  l(i)  which  represents 
a  first-order  Markov  chain.  If  we  suppose  that  y2  is  not  observed,  then  it  is 
self-evident  that  the  conditional  independence,  yiiiy4  |  y3)  which  is  implied 
by  U\  is  represented  by  the  undirected  graph  U2  in  Figure  1  (ii) ,  which  does 
not  include  y2.  In  addition,  U2  does  not  imply  any  additional  independence 
relations  that  are  not  also  implied  by  U\ . 

y\ — yi — yj,  —  y 4  y\ - >3  —  M 

(i)  (ii) 


Figure  1:  (i)  an  undirected  graph  U\\  (ii)  an  undirected  graph  U2  representing 
the  conditional  independence  structure  induced  on  {y1;  y3.  yA]  by  U\  after 
marginalizing  y-2. 

By  contrast  Markov  models  based  on  DAGs  are  not  closed  in  this  way. 
Consider  the  DAG.  V\,  shown  in  Figure  2(i).  This  DAG  implies  the  following 
independence  relations: 

ti-LL{t2, 2/2}  f2-lL{*i,2/i}  (|) 
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DAG  Vi  could  be  used  to  represent  two  successive  experiments  where: 

•  t\  and  t2  are  two  completely  randomized  treatments,  and  hence  there 
are  no  edges  that  point  towards  either  of  these  variables; 

•  Hi  and  y-2  represent  two  outcomes  of  interest; 

•  ho  is  the  underlying  health  status  of  the  patient; 

•  the  first  treatment  has  no  effect  on  the  second  outcome  hence  there  is 
no  edge  t\  y2. 

There  is  no  DAG  containing  only  the  vertices  {t\,  yi,  t2,  y2}  which  rep¬ 
resents  the  independence  relations  (J)  and  does  not  also  imply  some  other 
independence  relation  that  is  not  implied  by  V\.  Consequently,  any  DAG 
model  on  these  vertices  will  either  fail  to  represent  an  independence  rela¬ 
tion,  and  hence  contain  ‘too  many’  edges,  or  will  impose  some  additional 
independence  restriction  that  is  not  implied  by  Dx. 

Suppose  that  the  patient’s  underlying  health  status  h  is  not  observed, 
and  the  generating  structure  V x  is  unknown.  In  these  circumstances,  a  con¬ 
ventional  analysis  would  consider  DAG  models  containing  edges  that  are 
consistent  with  the  known  time  order  of  the  variables.  Given  sufficient  data, 
any  DAG  imposing  an  extra  independence  relation  will  be  rejected  by  a 
likelihood-ratio  test,  and  a  DAG  representing  some  subset  of  the  indepen¬ 
dence  relations,  such  as  the  DAG  in  Figure  2(ii),  will  be  chosen.  However, 
any  such  graph  will  contain  the  extra  edge  t\  — >■  y2,  and  fail  to  represent 
the  marginal  independence  of  these  variables.  Thus  such  an  analysis  would 
conceal  the  fact  that  the  first  treatment  does  not  affect  the  second  outcome. 
This  is  also  an  undesirable  result  from  a  purely  predictive  perspective,  since 
a  model  which  incorporated  this  marginal  independence  constraint  would  be 
more  parsimonious. 

Moreover,  even  if  we  were  to  consider  DAGs  that  were  compatible  with 
a  non-temporal  ordering  of  {yi,y2,ti,t2},  we  would  still  be  unable  to  find 
a  DAG  which  represented  all  and  only  the  independence  relations  in  ($). 
An  analysis  based  on  undirected  graphs,  or  chain  graphs,  under  the  LWF 
global  Markov  property,  would  still  include  additional  edges.  (It  is  possible 
to  represent  the  independence  structure  of  Vx  via  a  chain  graph  with  the 
AMP  Markov  property,  but  this  does  not  hold  for  an  arbitrary  DAG  under 
marginalization.  See  Section  9.4.) 
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Figure  2:  (i)  a  directed  acyclic  graph  Vu  representing  a  hypothesis  concern¬ 
ing  two  completely  randomized  treatments  and  two  outcomes  (see  text  for 
further  description);  (ii)  the  DAG  model  V-2  resulting  from  a  conventional 
analysis  of  {f1;  yi.t2, 2/2}- 

One  response  to  this  situation  is  to  consider  latent  variable  (LY)  models, 
since  h  is  a  hidden  variable  in  the  model  described  by  T>  1.  Though  this  is 
certainly  a  possible  approach  in  circumstances  where  much  is  known  about 
the  generating  process,  it  seems  unwise  in  other  situations  since  LY  models 
lack  almost  all  of  the  desirable  statistical  properties  attributed  to  graphical 
models  (without  hidden  variables)  above.  In  particular: 

LY  models  are  not  always  identified; 

-  the  likelihood  may  be  multi-modal; 

any  inference  may  be  very  sensitive  to  assumptions  made  about  the 
unobserved  variables; 

-  LY  models  with  hidden  variables  have  been  proved  not  to  be  curved 
exponential  families  even  in  very  simple  cases  (Geiger  et  al.,  2001); 

-  LY  models  do  not  in  general  have  a  well-defined  dimension  for  use  in 
scores  such  as  BIG.  or  y2-tests  (this  follows  from  the  previous  point); 

the  set  of  distributions  associated  with  an  LY  model  may  be  difficult 
to  characterize  (see  Settimi  and  Smith,  1999,  1998,  Geiger  et  al.,  2001, 
for  recent  results); 

LY  models  do  not  form  a  tractable  search  space:  an  arbitrary  number 
of  hidden  variables  may  be  incorporated,  so  the  class  contains  infinitely 
many  different  structures  relating  a  finite  set  of  variables. 
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This  presents  the  modeller  with  a  dilemma:  in  many  contexts  it  is  clearly 
unrealistic  to  assume  that  there  are  no  unmeasured  confounding  variables, 
and  misleading  analyses  may  result  (as  shown  above).  However,  models  that 
explicitly  include  hidden  variables  may  be  very  hard  to  work  with  for  the 
reasons  just  given. 

The  class  of  ancestral  graph  Markov  models  described  in  this  paper  is 
intended  to  provide  a  partial  resolution  to  this  conundrum.  This  class  extends 
the  class  of  DAG  models,  but  is  closed  under  marginalization.  In  addition, 
as  we  show  in  this  paper,  at  least  in  the  Gaussian  case  these  models  retain 
many  of  the  desirable  properties  possessed  by  standard  graphical  models.  It 
should  be  noted  however  that  tvro  different  DAG  models  may  lead  to  the 
same  ancestral  graph,  so  in  this  sense  information  is  lost. 

Up  to  this  point  we  have  considered  closure  under  marginalization.  There 
is  a  similar  notion  of  closure  under  conditioning  that  is  motivated  by  con¬ 
sidering  selection  effects  (see  Cox  and  Wermuth,  1996,  Cooper,  1995).  UG 
Markov  models  are  closed  under  conditioning,  DAG  models  are  not.  The 
class  of  Markov  models  described  here  is  also  closed  under  conditioning. 

The  remainder  of  the  paper  is  organized  as  follows: 

We  introduce  basic  graphical  notation  and  definitions  in  Section  2.  Sec¬ 
tion  3  introduces  the  class  of  ancestral  graphs  and  the  associated  global 
Markov  property.  We  also  define  the  subclass  of  maximal  ancestral  graphs, 
which  obey  a  pairwise  Markov  property. 

In  Section  4  we  formally  define  the  operation  of  marginalizing  and  con¬ 
ditioning  for  independence  models,  and  a  corresponding  graphical  transfor¬ 
mation.  Theorem  4.18  establishes  that  the  independence  model  associated 
with  the  transformed  graph  is  the  same  as  the  model  resulting  from  applying 
the  operations  of  marginalizing  and  conditioning  to  the  independence  model 
given  by  the  original  graph.  It  is  also  shown  that  the  graphical  transforma¬ 
tions  commute  (Theorem  4.20). 

Two  extension  results  are  proved  in  Section  5.  It  is  first  shown  that  by 
adding  edges  a  non-maximal  graph  may  be  made  maximal  and  this  exten¬ 
sion  is  unique  (Theorem  5.1).  Second,  it  is  demonstrated  that  a  maximal 
graph  may  be  made  complete  (so  that  there  is  an  edge  between  every  pair 
of  vertices)  by  a  sequence  of  edge  additions  that  preserve  maximality  (The¬ 
orem  5.6).  In  Section  6  it  is  shown  that  every  maximal  ancestral  graph  may 
be  obtained  by  transforming  a  DAG,  the  structure  of  which  bears  a  simple 
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relation  to  the  original  ancestral  graph  (Theorem  6.4).  Consequently,  every 
independence  model  associated  with  an  ancestral  graph  may  be  obtained  by 
applying  the  operations  of  marginalizing  and  conditioning  to  some  indepen¬ 
dence  model  given  by  a  DAG. 

Section  7  relates  the  operations  of  marginalizing  and  conditioning  that 
have  been  defined  for  independence  models  to  probability  distributions.  The¬ 
orem  7.6  then  shows  that  the  global  Markov  property  for  ancestral  graphs  is 
complete. 

In  Section  8  we  define  a  Gaussian  parametrization  of  an  ancestral  graph. 
It  is  shown  in  Theorem  8.7  that  each  parameter  is  either  a  concentration,  a 
regression  coefficient,  or  a  residual  variance  or  covariance.  Theorem  8.14  es¬ 
tablishes  that  if  the  graph  is  maximal  then  the  set  of  Gaussian  distributions 
associated  with  the  parametrization  is  exactly  the  set  of  Gaussian  distribu¬ 
tions  which  obey  the  global  Markov  property  for  the  graph. 

Section  9  contrasts  the  class  of  ancestral  graphs  to  summary  graphs,  in¬ 
troduced  by  Wermuth  et  al.  (1994),  and  MC-graphs  introduced  by  Koster 
(1999a).  Finally  Section  10  contains  a  brief  discussion. 

2  Basic  Definitions  and  Concepts 

In  this  section  we  introduce  notation  and  terminology  for  describing  inde¬ 
pendence  models  and  graphs. 

2.1  Independence  Models 

An  independence  model  3  over  a  set  V  is  a  set  of  triples  ( X ,  Y  |  Z )  where 
X,  Y  and  Z  are  disjoint  subsets  of  V;  X  and  Y  are  non-empty.  The  triple 
{X,  Y  j  Z)  is  interpreted  as  saying  that  X  is  independent  of  Y  given  Z .  In 
Section  7  we  relate  this  definition  to  conditional  independence  in  a  probability 
distribution.  (As  defined  here,  an  ‘independence  model’  need  not  correspond 
to  the  set  of  independence  relations  holding  in  any  probability  distribution.) 

2.1.1  Graphical  independence  models 

A  graph  Q  is  an  ordered  pair  (V.  E)  where  V  is  a  set  of  vertices  and  E  is  a  set 
of  edges.  A  separation  criterion  C  associates  an  independence  model  'Jc(G) 
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with  graph  Q\ 

(X,  Y  |  Z)  €  3 c(Q)  <=>  A'  is  separated  from  Y  by  Z  in  Q  under  criterion  C 

Such  a  criterion  C  is  also  referred  to  as  a  global  Markov  property.  The 
d-separation  criterion  introduced  by  Pearl  (1988)  is  an  example  of  such  a 
criterion. 

2.2  Mixed  Graphs 

A  mixed  graph  is  a  graph  containing  three  types  of  edge,  undirected,  (  —  ), 
directed ,  (— »),  and  bi-directed  («-*).  We  use  the  following  terminology  to 
describe  relations  between  variables  in  such  a  graph: 


'  a-p  ' 

neighbour  ' 

a  E  negr  (p) 

If  < 

a  *+  P 

>  in  Q  then  a  is  a  < 

spouse 

>  of  p,  and  < 

a  E  sp g(p) 

a  -»  3 

parent 

a  e  pa g(3) 

.  a  <~~~  3  v 

k  child 

,  ol  €  ch g(p)  / 

Note  that  the  three  edge  types  should  be  considered  as  distinct  symbols,  and 
in  particular, 

a  —  j3  a  A-  P  i=-  a  p. 

If  there  is  an  edge  a  — >  3,  or  a  •H-  p  then  there  is  said  to  be  an  arrowhead  at 
P  on  this  edge.  If  there  is  at  least  one  edge  between  a  pair  of  vertices  then 
these  vertices  are  adjacent.  We  do  not  allow  a  vertex  to  be  adjacent  to  itself. 

A  graph  Q'  =  (V ,E')  is  a  subgraph  of  Q  =  (V,E)  if  V  C  V  and  every 
edge  in  Q'  is  present  in  Q.  The  induced  subgraph  of  Q  over  A,  denoted  Qa 
has  vertex  set  .4,  and  contains  every  edge  present  in  Q  between  the  vertices 
in  A.  (See  Appendix  A.l  for  more  formal  statements  of  these  definitions.) 

2.3  Paths  and  Edge  Sequences 

A  sequence  of  edges  between  a  and  3  in  Q  is  an  ordered  (multi)set  of  edges 
{ e i . ....  eT) ,  such  that  there  exists  a  sequence  of  vertices  (not  necessarily  dis¬ 
tinct)  la  =  wx. . . .  .wn^i  =  3)(n  >  0).  where  edge  e,  has  endpoints 
A  sequence  of  edges  for  which  the  corresponding  sequence  of  vertices  con¬ 
tains  no  repetitions  is  called  a  path.  We  will  use  bold  greek  (fi)  to  denote 
paths  and  single  edges,  and  fraktur  (s)  to  denote  sequences.  Note  that  the 


result  of  concatenating  two  paths  with  a  common  endpoint  is  not  necessarily 
a  path,  though  it  is  always  a  sequence.  Paths  and  sequences  consisting  of  a 
single  vertex,  corresponding  to  a  sequence  of  no  edges,  are  permitted  for  the 
purpose  of  simplifying  proofs;  such  paths  will  be  called  empty  as  the  set  of 
associated  edges  is  empty. 

We  denote  a  subpath  of  a  path  7r.  by  Tr(ujj,  uik+i)  =  (ej, . . . ,  e*),  and 
likewise  for  sequences.  Unlike  a  subpath,  a  subsequence  is  not  uniquely 
specified  by  the  start  and  end  vertices,  hence  the  context  will  also  make  clear 
which  occurrence  of  each  vertex  in  the  sequence  is  referred  to. 

We  define  a  path  as  a  sequence  of  edges  rather  than  vertices  because  the 
latter  does  not  specify  a  unique  path  when  there  may  be  two  edges  between 
a  given  pair  of  vertices.  (However,  from  Section  3  on  we  will  only  consider 
graphs  containing  at  most  one  edge  between  each  pair  of  vertices.)  A  path 
of  the  form  a-  —»•••-»  /3,  on  which  every  edge  is  of  the  form  with  the 
arrowheads  pointing  towards  j3  is  a  directed  path  from  a  to  8. 

2.4  Ancestors  and  Anterior  Vertices 

A  vertex  a  is  said  to  be  an  ancestor  of  a  vertex  3  if  either  there  is  a  directed 
path  a  —>■•••—>  f3  from  a  to  8,  or  a  =  3. 

A  vertex  a  is  said  to  be  anterior  to  a  vertex  j3  if  there  is  a  path  n  on 
which  every  edge  is  either  of  the  form  7  —  6,  or  7  -*  6  with  S  between  7  and 
/?,  or  a  =  f3\  i.e.  there  are  no  edges  7  <5  and  there  are  no  edges  7  4—6 

pointing  towards  a.  Such  a  path  is  said  to  be  an  anterior  path  from  a  to  (3. 

We  apply  these  definitions  disjunctively  to  sets: 

an(X)  =  {a  |  a  is  an  ancestor  of  8  for  some  /3  G  X}; 
ant(X)  =  {a  |  a  is  anterior  to  ,5  for  some  8  G  X}. 

Our  usage  of  the  terms  ‘ancestor’  and  ‘anterior’  differs  from  Lauritzen 
(1996),  but  follows  Frydenberg  (1990a). 

Proposition  2.1  In  a  mixed  graph  Q 

(i)  If  X  C  Y  then  ant(X)  C  ant(l')  and  an(A")  C  an(V'); 

(ii)  X  C  ant(X)  =  ant(ant(X))  and  X  C  an(X)  =  an(an(X)): 

(hi)  ant  (A'  U  Y)  =  ant(X)  U  ant(F)  and  an(X  U  Y)  =  an(X)  U  am  Y). 


mmimmmiimmk 


Proof:  These  properties  follow  directly  from  the  definitions  of  an(-)  and 
ant(-).  □ 


Proposition  2.2  If  X  and  Y  are  disjoint  sets  of  vertices  in  a  mixed  graph 
G  then 

(i)  ant ( ant (X)  \  Y)  —  ant(X); 

(ii)  an(an(A)  \  Y)  —  anpf). 

Proof:  (i)  Since  X  and  Y  are  disjoint,  X  C  ant(X)  \  Y .  By  Proposition 
2.1  (i) ,  ant(AT)  C  ant(ant(X)  \  Y).  Conversely,  ant(X)  \  Y  C  ant(X)  so 
ant ( ant (X)  \  Y)  C  ant(ant(X))  =  ant(X),  by  Proposition  2. 1  (i)  and  (ii). 
The  proof  of  (ii)  is  very  similar.  □ 

A  directed  path  from  a  to  /3  together  with  an  edge  8  — >  a  is  called  a 
(fully)  directed  cycle.  An  anterior  path  from  a  to  j3  together  with  an  edge 
j3  — >  a  is  called  a  partially  directed  cycle.  A  directed  acyclic  graph  (DAG) 
is  a  mixed  graph  in  which  all  edges  are  directed,  and  there  are  no  directed 
cycles. 

3  Ancestral  Graphs 

The  class  of  mixed  graphs  is  much  larger  than  required  for  our  purposes,  in 
particular,  under  natural  separation  criteria,  it  includes  independence  models 
that  do  not  correspond  to  DAG  models  under  marginalizing  and  conditioning. 
We  now  introduce  the  subclass  of  ancestral  graphs. 


3.1  Definition  of  an  ancestral  graph 

An  ancestral  graph  Q  is  a  mixed  graph  in  which  the  following  conditions  hold 
for  all  vertices  a  in  Q\ 

(i)  a  ant(pa(o:)  Usp(a)); 

(ii)  if  nefa)  0  then  pa(a)  Usp(a)  =  0. 


In  words,  condition  (i)  requires  that  if  a  and  3  are  joined  by  an  edge  with  an 
arrowhead  at  a,  then  a  is  not  anterior  to  j3.  Condition  (ii)  requires  that  there 
be  no  arrowheads  present  at  a  vertex  which  is  an  endpoint  of  an  undirected 
edge.  Condition  (i)  implies  that  if  a  and  3  are  joined  by  an  edge  with  an 
arrowhead  at  a,  then  a  is  not  an  ancestor  of  ,0.  This  is  the  motivation 
for  terming  such  graphs  ‘ancestral’.  (See  also  Corollary  3.10.)  Examples  of 
ancestral  and  non-ancestral  mixed  graphs  are  shown  in  Figure  3. 

ct— — >~b  a- - b 

11  1  t  l 

(a) 


Figure  3:  (a)  Mixed  graphs  that  are  not  ancestral;  (b)  ancestral  mixed  graphs. 


Lemma  3.1  In  an  ancestral  graph  for  every  vertex  a  the  sets  ne(q),  pa(cv), 
ch(tt)  and  sp(a)  are  disjoint ,  thus  there  is  at  most  one  edge  between  any  pair 
of  vertices. 

Proof:  n e(a),  pa(a)  and  ch(a)  are  disjoint  by  condition  (i).  ne(o)  nsp(o;)  = 
0  by  (ii)  since  at  most  one  of  these  sets  is  non-empty.  Finally  (i)  implies  that 
sp(a)  fl  pa(a)  C  sp(a)  fl  ant  (a)  =  0.  and  likewise  sp(a)  Cl  ch(a)  =  0.  □ 

Lemma  3.2  If  Q  is  an  ancestral  graph  then  the  following  hold: 

(a)  If  a  and  3  are  adjacent  in  Q  and  a  €  an(/3)  then  a  — »  ,3. 

(b)  The  configurations  a  —  j3  4-»  7  and  a  —  ,3  4-  7  do  not  occur  (regardless 
of  whether  a  and  7  are  adjacent). 

(c)  There  are  no  directed  cycles  or  partially  directed  cycles. 

Proof:  (a)  follows  because  condition  (i)  rules  out  a  3  or  a  3,  while 
( ii)  rules  out  a  —  3.  ( b)  is  simply  a  restatement  of  condition  (ii).  ic;  follows 
because  ii)  rules  out  fully  directed  cycles,  while  the  configuration  — >  7  — 
occurs  in  any  partially  directed  cycle.  □ 

If  there  is  at  most  one  edge  between  two  vertices  in  a  graph  then  condi¬ 
tions  (a),  (b,  and  (c)  in  Lemma  3.2  are  sufficient  for  Q  to  be  ancestral. 
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Corollary  3.3  In  an  ancestral  graph  an  anterior  path  from  a  to  0  takes  one 
of  three  forms:  a  —■■■  —  (3,  a  — »  •  •  •  — >  0,  or  a  —  •  ■  ■  —  —>••••  -4-  0. 

Proof:  Follows  from  the  definition  of  an  anterior  path  arid  Lemma  3. 2(b). □ 

Proposition  3.4  If  Q  is  an  undirected  graph,  or  a  directed  acyclic  graph, 
then  Q  is  an  ancestral  graph. 

Proposition  3.5  If  Q  is  an  ancestral  graph  and  Q'  is  a  subgraph  of  Q,  then 
G'  is  ancestral. 

Proof:  The  definition  of  an  ancestral  graph  only  forbids  certain  configu¬ 
rations  of  edges.  If  these  do  not  occur  in  Q  then  they  do  not  occur  in  a 
subgraph  Q' .  □ 

3.2  Undirected  Edges  in  an  Ancestral  Graph 

Let  ung  =  {a  |  pag(a)  U  spg(a)  =  0},  be  the  set  of  vertices  at  which  no 
arrowheads  are  present  in  Q.  Note  that  if  neg(a)  ^  0  then,  by  condition  (ii) 
in  the  definition  of  an  ancestral  graph,  a  £  ung,  so  ung  contains  all  endpoints 
of  undirected  edges  in  Q. 

Proposition  3.6  If  Q  is  an  ancestral  graph,  and  Q'  is  a  subgraph  with  the 
same  vertex  set,  then  ung  C  ung>. 

Proof:  Since  Q'  has  a  subset  of  the  edges  in  Q .  pa s(a)  U  spg(o)  =  0  implies 
pag,  (a)  U  spg/  (a)  =  0.  □ 

Lemma  3.7  If  Q  is  an  ancestral  graph  with  vertex  set  V 


a,  3  e  V  \  ung  1 

and  < 

a  —  0 

>•  in  G  then  < 

a,  3  €  ung  ) 

a  — >  8 

v  '  / 

0  €  V  \  ung  J 

Proof:  Follows  directly  from  definition  of  ung  and  Lemma  3.2(b).  □ 

Lemma  3.7  shows  that  any  ancestral  graph  can  be  split  into  an  undirected 
graph  Gang,  and  an  ancestral  graph  containing  no  undirected  edges  Gv\ung 
any  edge  between  a  vertex  a  €  ung  and  a  vertex  0  G  V  \  ung  takes  the  form 
a  — »  8.  See  Figure  4.  This  result  is  useful  in  developing  parametrizations 
for  the  resulting  independence  models  (see  Section  8). 
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Figure  4:  Schematic  showing  decomposition  of  an  ancestral  graph  into  an 
undirected  graph  and  a  graph  containing  no  undirected  edges. 

Lemma  3.8  For  an  ancestral  graph  Q, 

(i)  if  a  E  ung  then  0  E  antg(a)  ant  g(0); 

(ii)  if  a  and  0  are  such  that  a  ^  0.  a  E  ant g(0)  and  0  E  antg(a)  then 
a,  0  E  un g,  and  there  is  a  path  joining  a  and  0  on  which  every  edge  is 
undirected; 

(iii)  ant g(a)  \  an g(a)  C  ung. 

Proof:  (i)  follows  from  Lemma  3.2(b)  and  Corollary  3.3.  (ii)  follows  since 
by  Lemma  3.2(c)  there  are  no  partially  directed  cycles  and  thus  the  anterior 
paths  between  a  and  0  consist  only  of  undirected  edges,  so  a.  0  E  u%  by 
Lemma  3.7.  (iii)  follows  because  if  a  vertex  0  is  anterior  to  a,  but  not 
an  ancestor  of  a,  then  by  Corollary  3.3  any  anterior  path  starts  with  an 
undirected  edge,  and  the  result  follows  from  Lemma  3.7.  □ 

Lemma  3.9  If  Q  is  an  ancestral  graph,  and  a,  0  are  adjacent  vertices  in  Q 
then: 

(i)  a  —  0  yy  a  G  ant g(0),0  E  antg(a): 

(ii)  a -e  0  y»  a  G  &ntg(0),0  ()  antg(a); 

(iii)  a  44  0  <4>  a  ant g{0f&0  y  antg(a). 

Proof:  (i)  {  =>)  follows  by  the  definition  of  anterior:  (i)  ( <=)by  Lemma  3.8  (ii) 
and  Lemma  3.2(b).  Claim  (ii) ( =?-)  follows  by  the  definition  of  anterior  and 
property  Ci)  of  an  ancestral  graph;  fiij(y=)  follows  because  from  Lemma  3.8 
(i).  0  f  ung,  and  so  by  Lemma  3.7  and  property  (i)  of  an  ancestral  graph, 
a  -r  3.  (iii) i  ==>)  follows  by  property  (i)  of  an  ancestral  graph,  (iii) (4=) 
follows  by  definition  of  anterior.  □ 
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A  direct  consequence  of  Lemma  3.9  is  that  an  ancestral  graph  is  uniquely 
determined  by  its  adjacencies  (or  ‘skeleton’)  and  anterior  relations.  More 
formally: 

Corollary  3.10  If  Q\  and  Q2  are  two  ancestral  graphs  with  the  same  vertex 
set  V.  and  adjacencies,  then  if 'da.  3  E  1',  adjacent  in  Qx  and  Q2. 

a  E  antSl(d)  a  E  mtgfifi) 


then  Qi  —  Q2. 

Proof:  Follows  directly  from  Lemma  3.9.  □ 

Note  that  this  does  not  hold  in  general  for  non-ancestral  graphs.  Sec 
Figure  5  for  an  example. 


(0  (ii) 


Figure  5:  Two  pairs  of  graphs  that  share  the  same  adjacencies  and  anterior 
relations  between  adjacent  vertices,  and  yet  are  not  equivalent. 


3.3  Bi-Directed  Edges  in  an  Ancestral  Graph 

The  following  Lemma  shows  that  the  ancestor  relation  induces  a  partial  or¬ 
dering  on  the  bi-dirccted  edges  in  an  ancestral  graph. 

Lemma  3.11  Let  Q  be  an  ancestral  graph.  The  relation  <  defined  by: 

a  3  -<  7  6  if  a,  3  E  an{  {7.  h})  and  {a.  3}  y=  {7.  <5} 

defines  a  strict  (irrefiexive)  partial  order  on  the  bi-directed  edges  in  Q. 

Proof:  Transitivity  of  the  relation  -<  follows  directly  from  transitivity  of  the 
ancestor  relation.  Suppose  for  a  contradiction  that  a  3  -<  a  eEO  -<  a  (3, 
but  {a,  3}  ye  {' .  ; Either  a  (p  {70  <5}  or  0  E  {"  -5}.  Without  loss  of 
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generality,  suppose  the  former.  Since  a  €  an({7,  <5})  and  y.t)  €  an({a.  o})  it 
then  follows  that  either  a  €  an(,J).  or  there  is  a  directed  cycle  containing  a 
and  7  or  <5.  In  both  cases  condition  (i)  in  the  definition  of  an  ancestral  graph 
is  violated.  □ 


Note  that  the  relation  given  by 

a  4-»  0  -<*  7  5  if  ^a  €  an({7, 5})  or  3  e  an({o, d})^  and  {a.  3}  ^  {7,  <5} 

does  not  give  an  ordering  on  the  bi-directed  edges  as  shown  by  the  ancestral 
graph  in  Figure  6.  This  is  significant  since  it  means  that  in  an  ancestral 
graph  it  is  not  possible  in  general  to  construct  ordered  blocks  of  vertices 
such  that  all  bi-directed  edges  are  within  blocks  and  all  directed  edges  are 
between  vertices  in  different  blocks  and  are  directed  in  accordance  with  the 
ordering. 


a- — ►/? 

i  i 

Y •  ►  8 


Figure  6:  An  ancestral  graph  which  cannot  be  arranged  in  ordered  blocks 
with  bi-directed  edges  within  blocks  and  edges  between  blocks  directed  in 
accordance  with  the  ordering.  (See  text  for  further  discussion.) 


3.4  The  Pathwise  m-separation  Criterion 

We  now  extend  Pearl’s  d-separation  criterion  (see  Pearl.  1988).  defined  orig¬ 
inally  for  DAGs.  to  ancestral  graphs. 

A  non-endpoint  vertex  (  on  a  path  is  a  collider  on  the  path  if  the  edges 
preceding  and  succeeding  £  on  the  path  have  an  arrowhead  at  £,  i.e.  -»  £ 

<-f  C  ef.  e  (  f-.  — r  (  ft.  A  non-endpoint  vertex  £  on  a  path  which  is  not  a 
collider  is  a  non-collider  on  the  path.  A  path  between  vertices  a  and  0  in  an 
ancestral  graph  Q  is  said  to  be  rn- connecting  given  a  set  Z  ('possibly  empty), 
with  a.  3  €  Z.  if 

(i  ;  every  non-collider  on  the  path  is  not  in  Z.  and 
(ii)  every  collider  on  the  path  is  in  antg(Z). 
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If  there  is  no  path  m-connecting  a  and  /3  given  Z.  then  a  and  3  are  said  to 
be  m-separated  given  Z.  Sets  X  and  Y  are  m-separated  given  Z,  if  for  every 
pair  a,  3.  with  a  €  X  and  3  G  1'.  a  and  3  are  m-separated  given  Z  (X.  Y.  Z 
are  disjoint  sets;  A'.}'  are  non-empty).  We  denote  the  independence  model 
resulting  from  applying  the  m-separation  criterion  to  Q.  by  3 rn(Q). 

This  is  an  extension  of  Pearl's  d-separation  criterion  to  mixed  graphs  in 
that  in  a  DAG  V.  a  path  is  d-connecting  if  and  only  if  it  is  m-connecting.  See 
Figure  7(a)  for  an  example.  The  formulation  of  this  property  leads  directly 
to: 

Proposition  3.12  If  Q  is  an  ancestral  graph,  and  Q'  is  a  subgraph  with  the 
same  vertex  set,  then  3m(Q)  C  3m(Q'). 

Proof:  This  holds  because  any  path  in  Q'  exists  in  Q.  □ 

Notice  that  it  follows  directly  from  Corollary  3.3,  and  Lemma  3.2  (b)  that 
if  7  is  a  collider  on  a  path  7r  in  an  ancestral  graph  Q  then  7  G  ant g(j3)  7  £ 
an g(,8).  Since  the  set  of  m-connecting  paths  will  not  change,  strengthening 
condition  (ii)  in  the  definition  of  m-separation  to: 

(ii)'  every  collider  on  the  path  is  in  an g(Z). 

will  not  change  the  resulting  independence  model  3m(G).  This  formulation 
is  closer  to  the  original  definition  of  d-separation  as  originally  defined  for 
directed  acyclic  graphs,  since  it  does  not  use  the  anterior  relation.  The 
only  change  is  that  the  definitions  of  ‘collider’  and  ‘non-collider’  have  been 
extended  to  allow  for  edges  of  the  form  —  and  f-K  (Also  see  the  definition 
of  ‘h-separatiorf  introduced  in  Verma  and  Pearl  (1990).) 

3.4.1  Properties  of  m-connecting  paths 

We  now  prove  two  Lemmas  giving  properties  of  m-connecting  paths  that  we 
will  exploit  in  Section  3.6. 

Lemma  3.13  If  tv  is  a  path  m-connecting  a  and  3  given  Z  in  an  ancestral 
graph  Q  then  every  vertex  on  it  is  in  ant({a,  3}  u  Z). 


Figure  7:  Example  of  global  Markov  properties,  (a)  An  ancestral  graph  Q, 
thicker  edges  form  a  path  m-connecting  x  and  y  given  {z};  (b)  the  subgraph 
G y,z})'i  (c)  the  augmented  graph  (£ant({s)2/, *}))“,  in  which  x  and  y  are  not 
separated  by  {z}. 

Proof:  Suppose  7  is  on  7r  and  is  not  anterior  to  a  or  /?.  Then,  on  each  of  the 
subpaths  7r(o!,  7)  and  7r(7,  /?),  there  is  at  least  one  edge  with  an  arrowhead 
pointing  towards  7  along  the  subpath.  Let  (jai  and  <f7p  be  the  vertices  at 
which  such  arrowheads  occur  that  are  closest  to  7  on  the  respective  subpaths. 
There  are  now  three  cases: 

If  7  ^  (f>yp  then  7r(7,  (p7p)  is  an  anterior  path  from  7  to  (j)7p.  It  further 
follows  from  Lemma  3.2(b)  and  Corollary  3.3  that  f>7p  is  a  collider 
on  7 r,  hence  anterior  to  Z,  since  it  is  m-connecting  given  Z.  Hence 
7  G  ant (Z). 

If  7  ^  4>ai  then  by  a  symmetric  argument  to  the  previous  case  it  follows 
that  7  is  anterior  to  </>Q7,  and  <fiai  is  a  collider  on  7r  and  thus  anterior 
to  Z.  Thus  in  this  case,  7  G  ant (Z). 

If  d>a7  =  ,5  =  <f>7p  then  7  is  a  collider  on  n,  hence  anterior  to  Z. 


□ 

Lemma  3.14  Let  Q  be  an  ancestral  graph  containing  disjoint  sets  of  vertices 
X,  Y.  Z  (Z  may  be  empty).  If  there  are  vertices  a  €  X  and  B  G  Y  joined  by  a 
path  n  on  which,  no  non-collider  is  in  Z  and  every  collider  is  in  ant(AuFuZ) 
then  there  exist  vertices  a*  G  X,  B*  G  1'  such  that  a*  and  (3*  are  m-connected 
given  Z  in  Q . 
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Figure  8:  Illustration  of  Lemma  3.14:  (a)  a  path  on  which  every  vertex  is  an 
ancestor  of  a  or  3:  (b)  a  path  m-connecting  a  and  3  given  0. 

Proof:  Let  n*  be  a  path  which  contains  the  minimum  number  of  colliders 
of  any  path  between  some  vertex  a*  €  X  and  some  vertex  <3*  €  Y  on  which  no 
non-collider  is  in  Z  and  every  collider  is  in  ant(X  U  Y  U  Z).  n*  is  guaranteed 
to  exist  since  the  path  ji  described  in  the  Lemma  has  this  form.  In  order 
to  show  that  pi*  m-connects  a*  and  3*  given  Z  it  is  sufficient  to  show  that 
every  collider  on  pi*  is  in  ant(Z). 

Suppose  for  a  contradiction  that  there  is  a  collider  7  on  pi*  and  7 
ant(Z).  By  construction  7  6  ant(A)  U  Y  U  Z),  so  either  7  €  ant(A')  \  ant(Z) 
or  7  E  ant(F)  \  ant(Z).  Suppose  the  former,  then  there  is  a  directed  path 
7r  from  7  to  some  vertex  a'  E  X.  Let  <5  be  the  vertex  closest  to  3*  cm 
H*  which  is  also  on  7 r.  By  construction  the  paths  pi*  (5, 3*)  and  tt(6,  a')  do 
not  intersect  except  at  8.  Hence  concatenating  these  subpaths  forms  a  path 
which  satisfies  the  conditions  on  pi*  but  has  fewer  colliders  than  pi* ,  which 
is  a  contradiction.  The  case  wiiere  7  E  ant(F)  \  ant(Z)  is  symmetric.  □ 

Corollary  3.15  In  an  ancestral  graph  Q,  there  is  a  path  pb  between  a  and 
3  on  which  no  non-collider  is  in  a  set  Z  (a,  3  ^  Z)  and  every  collider  is  in 
ant({a.  0}  U  Z)  if  and  only  if  there  is  a  path  m-connecting  a  and  3  given  Z 
in  Q, 

Proof:  One  direction  is  immediate  and  the  other  is  a  special  case  of  Lemma 
3.14  with  A"  =  {a}.  Y  =  {3}.  G 

This  Corollary  shows  that  condition  fiii  in  the  definition  of  m-separation 
can  be  weakened  to: 

(ii)"  every  collider  on  the  path  is  in  ant ({a./3}  U  Z). 

without  changing  the  resulting  independence  model  (for  ancestral  graphs). 
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3.4.2  Formulation  via  sequences 

Koster  (2000)  shows  that  if  the  separation  criterion  is  applied  to  sequences  of 
edges  (which  may  include  repetitions  of  the  same  edge)  as  opposed  to  paths, 
then  some  simplification  is  possible.  Under  this  formulation  vertices  a  and 
3  in  a  mixed  graph  Q  are  said  to  be  m-connecting  given  a  set  Z  if  there  is  a 
sequence  s  for  which 

(i) *  every  non-collider  on  s  is  not  in  Z,  and 

(ii) *  every  collider  on  s  is  in  Z. 

The  definitions  of  collider  and  non-collider  remain  unchanged,  but  are  applied 
to  edges  occurring  in  sequences,  so  a  -t  ,3  f-  a  forms  a  collider.  Koster 
(2000)  proves  that  this  criterion  is  identical  to  the  m-separation  criterion 
defined  here  for  paths:  the  proof  is  based  on  the  fact  that  there  is  a  directed 
path  from  a  collider  7  to  a  vertex  (  6  Z  if  and  only  if  there  is  a  sequence  of 
the  form  7  —>••••  — >£■<—  ••••<—  7. 

We  do  not  make  use  of  this  criterion  in  this  paper,  as  paths,  rather  than 
sequences,  are  fundamental  to  our  main  construction  (see  Section  4.2.3). 

3.5  The  Augmentation  nT-separation  Criterion 

The  global  Markov  property  for  DAGs  may  be  formulated  via  separation 
in  an  undirected  graph,  obtained  from  the  original  DAG  by  first  forming  a 
subgraph  and  then  adding  undirected  edges  between  non-adjacent  vertices 
that  share  a  common  child,  a  process  known  as  ‘moralizing’.  (See  Lauritzen 
(1996).  p.47  for  details.)  In  this  subsection  we  formulate  the  global  Markov 
property  for  ancestral  mixed  graphs  in  this  way.  In  the  next  subsection  the 
resulting  independence  model  is  shown  to  be  equivalent  to  that  obtained  via 
m-separation.  It  is  useful  to  have  two  formulations  of  the  Markov  property 
because  some  proofs  are  simpler  using  one  while  other  proofs  are  simplex- 
using  the  other. 

3.5.1  The  augmented  graph  {G)a 

Two  vertices  a  and  3  in  an  ancestral  graph  Q  are  said  to  be  collider  connected 
if  there  is  a  path  from  a  to  3  in  Q  on  which  every  vertex  except  the  endpoints 
is  a  collider,  such  a  path  is  called  a  collider  path.  (Koster  (1999b)  refers  to 


such  a  path  as  a  ‘pure  collision  path’.)  Note  that  if  there  is  a  single  edge 
between  a  and  8  in  the  graph  then  a  and  6  are  (vacuously)  collider  connected. 

The  augmented  graph,  denoted  ( Q)a ,  derived  from  the  mixed  graph  Q  is 
an  undirected  graph  with  the  same  vertex  set  as  Q  such  that 

7  —  S  in  ( G)a  •*=>  7  and  5  are  collider  connected  in  Q. 

3.5.2  Definition  of  nT-separation 

Sets  X,  Y  and  Z  are  said  to  be  nT-separated  if  X  and  Y  are  separated  by  Z 
in  (£ant(xuyuz))a  (X ,  Y,  Z  are  disjoint  sets;  X,  Y  are  non-empty).  Otherwise 
X  and  Y  are  said  to  be  nT-connected  given  Z.  The  resulting  independence 
model  is  denoted  by  3m*(£?).  See  Figure  7(b), (c)  for  an  example. 

When  applied  to  DAGs,  or  UGs,  the  augmentation  criterion  presented 
here  is  equivalent  to  the  Lauritzen-Wermuth-Frydenberg  moralization  crite¬ 
rion.  (See  Section  9.4  for  discussion  of  chain  graphs.) 

3.5.3  Minimal  m*-connecting  paths 

If  there  is  an  edge  7  —  5  in  ( Q)a ,  but  there  is  no  edge  between  7  and  5  in  Q, 
then  the  edge  is  said  to  be  augmented.  A  path  connecting  x  and  y  given  Z 
is  said  to  be  minimal  if  there  is  no  other  such  path  which  connects  x  and  y 
given  Z  but  has  fewer  edges  than  p. 

We  now  prove  a  property  of  minimal  paths  that  is  used  in  the  next  section: 

Lemma  3.16  Let  Q  be  an  ancestral  graph.  If  fj,  is  a  minimal  path  connect¬ 
ing  a  and  0  given  Z  in  ( Q)a ,  then  a  collider  path  in  Q  associated  with  an 
augmented  edge  7  —  5  on  p  has  no  vertex  in  common  with  pb.  or  any  collider 
path  associated  with  another  augmented  edge  on  pb.  except  possibly  7  or  S. 

Proof:  Suppose  that  7  —  S  and  e  —  cp  are  two  augmented  edges,  occurring 
in  that,  order  on  pb,  and  that  the  associated  collider  paths  have  in  common  a 
vertex  which  is  not  an  endpoint  of  these  paths.  Then  7  and  cp  are  adjacent 
in  (G)a.  Thus  a  shorter  path  may  be  constructed  by  concatenating  pb(a,  7), 
7  —  4>  and  pb{(p,0),  which  is  a  contradiction.  Likewise  suppose  that  k  is  a 
vertex  on  a  collider  path  between  7  and  8  which  also  occurs  on  p,.  k  either 
occurs  before  or  after  7  on  the  path.  Suppose  the  former,  then  since  k  —  5 
in  (G)a,  a  shorter  path  may  be  formed  by  concatenating  gb(a, «),  k  —  8  and 
fj,(8,0).  The  case  where  k  occurs  after  5  is  similar.  □ 
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3.6  Equivalence  of  m-separation  and  m*-separation 

Lemma  3.17  In  an  ancestral  graph  Q  suppose  that  p  is  a  path  which  m- 
connects  a  and  (3  given  Z.  The  sequence  of  non-colliders  on  p  forms  a  path 
connecting  a  and  8  in  (Ga.nt({a,0}uz))a  ■ 

Proof:  By  Lemma  3.13  all  the  vertices  on  j a  are  in  Gz.nt{{a,p}\jZ)-  Suppose  that 
uji  and  u)t+i  (l<i</c  — 1)  are  the  successive  non-colliders  on  p.  The  subpath 
1)  consists  entirely  of  colliders,  hence  uii  and  j  are  adjacent  in 
(Ga.nt({a,0}uz))a ■  Similarly  uji  and  ca k  are  adjacent  to  a  and  j3  respectively  in 

(<?a.nt({a,/3}UZ))a-  D 

Theorem  3.18  For  an  ancestral  graph  Q,  3m*(G)  =  3m(G)- 

We  break  the  proof  into  two  pieces: 

Proof:  3m*(G)  C  3rn(G) 

We  proceed  by  showing  that  if  (A",  Y  \  Z)  ^  3m(G)  then  (A,  Y  \  Z)  ^  3m*  ( G )■ 
If  (X,  Y\Z)$  3m(G)  then  there  are  vertices  a  G  X,  (3  G  Y  such  that  there 
is  an  m-connecting  path  p  between  a  and  / 3  given  Z  in  G ■  By  Lemma  3.17 
the  non-colliders  on  p  form  a  path  p*  connecting  a  and  f3  in  (^ant(vuYuz))a- 
Since  p  is  m-connecting,  no  non-collider  on  p  is  in  Z  hence  no  vertex  on  p* 
is  in  Z.  Thus  (X,  Y  |  Z)  $  3rn-  (G).  □ 


Proof:  3m(G)  C  3m-(G) 

We  show  that  if  (X  ,Y  |  Z)  <£  3m*(G)  then  {X.Y  \  Z)  £  3rn(G).  If  (X.  Y  \ 
Z)  ^  3m*(G)  then  there  are  vertices  a  G  X ,  j3  E  Y  such  that  there  is  a  mini¬ 
mal  path  7 r  connecting  a  and  /3  in  (I?ant(VuYuz))a  on  which  no  vertex  is  in  Z. 
Our  strategy  is  to  replace  each  augmented  edge  on  7r  with  a  corresponding 
collider  path  in  <?ant(A'uYuz)  and  replace  the  other  edges  on  7r  with  the  corre¬ 
sponding  edge  in  G ■  It  follows  from  Lemma  3.16  that  the  resulting  sequence 
of  edges  forms  a  path  from  a  to  6  in  Q,  which  we  denote  v.  Further,  any 
non-collider  on  u  is  a  vertex  on  7r  and  hence  not  in  Z.  Finally,  since  all  ver¬ 
tices  in  v  are  in  G&xa{xuyuz)  it  follows  that  every  collider  is  in  ant(XUFuZ). 
Thus  by  Lemma  3.14  there  are  vertices  a*  E  X  and  8*  E  Y  such  that  a*  and 
B*  are  m-connected  given  Z  in  G ■  Thus  (A,  Y  j  Z)  ^  3m(G)-  □ 


20 


3.7  Maximal  Ancestral  Graphs 

Independence  models  described  by  DAGs  and  undirected  graphs  satisfy  pair¬ 
wise  Markov  properties  with  respect  to  these  graphs,  hence  every  missing 
edge  corresponds  to  a  conditional  independence  (see  Lauritzen  (1996).  p.32). 
This  is  not  true  in  general  for  an  arbitrary  ancestral  graph,  as  shown  by  the 
graph  in  Figure  9  (a). 


Figure  9:  (a)  The  simplest  example  of  a  non-maximal  ancestral  graph:  7 
and  8  are  not  adjacent,  but  are  m-connected  given  every  subset  of  {a,/?}, 
hence  3rn(G)  =  0;  (b)  an  extension  of  the  graph  in  (a)  with  the  same  (trivial) 
independence  model. 

This  motivates  the  following  definition:  an  ancestral  graph  Q  is  said  to  be 
maximal  if  for  every  pair  of  vertices  a,  ,3  if  a  and  3  are  not  adjacent  in  Q  then 
there  is  a  set  Z  (a ,8  ^  Z ),  such  that  ({a},  {6}  \  Z)  G  3m(G).  Thus  a  graph 
is  maximal  if  every  missing  edge  corresponds  to  at  least  one  independence  in 
the  corresponding  independence  model. 

Proposition  3.19  If  Q  is  an  undirected,  graph,  or  a  directed  acyclic  graph 
then  Q  is  maximal. 

Proof:  Follows  directly  from  the  existence  of  pairwise  Markov  properties  for 
DAGs  and  undirected  graphs.  □ 

The  use  of  the  term  ‘maximal’  is  motivated  by  the  following: 

Proposition  3.20  If  Q  =  {V.E)  is  a  maximal  ancestral  graph,  and  Q  is  a 
subgraph  of  Q*  =  (V,  E*),  then  3rn(Q)  =  3 m(G*)  implies  Q  =  Q* . 

Proof:  If  some  pair  a.  3  are  adjacent  in  Q*  but  not  Q,  then  in  Q*.  a  and  3 
are  m-connected  by  any  subset  of  1 '  {a.  b}.  Hence  3mlQ)  ^  3m(G* )■  □ 

Hence  maximal  ancestral  graphs  are  maximal  in  the  sense  that  no  addi¬ 
tional  edge  may  be  added  to  the  graph  without  changing  the  independence 
model.  The  following  Theorem  gives  the  converse. 


Theorem  5.1  If  Q  is  an  ancestral  graph  then  there  exists  a  unique  maximal 
ancestral  graph  Q  formed  by  adding  <4  edges  to  Q  such  that  3m(G)  =  3 m(G). 

We  postpone  the  proof  of  this  Theorem  until  Section  5.1,  since  it  fol¬ 
lows  directly  from  another  result.  In  Corollary  5.3  we  show  that  a  maximal 
ancestral  graph  satisfies  the  following: 


Pairwise  Markov  property 

If  there  is  no  edge  between  a  and  3  in  Q  then 


({a},  {3} 


ant({a,  .3})  \  {a,/3})  £  3m(£). 


3.8  Complete  Ancestral  Graphs 

An  ancestral  graph  is  complete  if  there  is  an  edge  between  every  pair  of 
distinct  vertices.  A  graph  is  said  to  be  transitive  if  a  — »  8  — >  7  implies 
o-4  7.  Andersson  et  al.  (1995.  1997),  and  Andersson  and  Perlman  (1998) 
study  properties  of  independence  models  based  on  transitive  DAGs. 

Lemma  3.21  If  Q  is  a  complete  ancestral  graph  then 

(i)  Q  is  transitive; 

(ii)  the  induced  subgraph  Quag  is  a  complete  'undirected  graph: 

(iii)  if  a  E  V  \  une  then  ant g(a)  =  pa g(a)  U  {a}; 

(iv)  if  a  &  un^  then  ant g(a)  =  ung. 

Proof:  If  a  — >  3  -4  7  in  Q  then  a  -4  7  since  if  a  —  7,  a  <—  7,  or  a  44  7  then 
G  would  not  be  ancestral,  establishing  (i).  If  a,  3  e  uric;  then  by  Lemma  3.7. 
q  —  3,  which  establishes  (ii).  Suppose  a  e  V  \  u%,  3  €  antg(o).  If  3  E  une 
then  3  -4  a,  by  Lemma  3.7:  if  3  E  I" '  u%  then  3  E  an g{a)  and  so  3  -4  a 
by  (i).  Hence  (iii)  holds,  (iv)  follows  directly  from  (ii).  □ 
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4  Marginalizing  and  Conditioning 

In  this  section  we  first  introduce  marginalizing  and  conditioning  for  an  inde¬ 
pendence  model.  We  then  define  a  graphical  transformation  of  an  ancestral 
graph.  We  show  that  the  Independence  model  corresponding  to  the  trans¬ 
formed  graph  is  the  independence  model  obtained  by  marginalizing  and  con¬ 
ditioning  the  independence  model  of  the  original  graph.  In  the  remaining 
subsections  we  derive  several  useful  consequences. 


4.1  Marginalizing  and  Conditioning  Independence 
Models  (3g) 


An  independence  model  3  with  vertex  set  V  after  marginalizing  out  a  subset 
L,  is  simply  the  subset  of  triples  which  do  not  involve  any  vertices  in  L.  More 
formally  we  define: 


3[t  =  {(X,Y  |  Z) 


(X,y|z>e3;  {XuYuZ)nL  =  %\. 


If  3  contains  the  independence  relations  present  in  a  distribution  P,  then 
3[l  contains  the  subset  of  independence  relations  remaining  after  marginal¬ 
izing  out  the  ‘latent’  variables  in  L;  see  Theorem  7.1.  (Note  the  distinct 
uses  of  the  vertical  bar  in  {•,  •  |  •)  and  {•  |  •}•) 

An  independence  model  3  with  vertex  set  V  after  conditioning  on  a  subset 
S  is  the  set  of  triples  defined  as  follows: 


3[s  =  {(X,Y\Z) 


( X ,  Y  |  Z  U  S)  e  3;  (X  U  Y  U  Z)  n  S  =  0}. 


Thus  if  3  contains  the  independence  relations  present  in  a  distribution  P  then 
3[s  constitutes  the  subset  of  independencies  holding  among  the  remaining 
variables  after  conditioning  on  S ;  see  Theorem  7.1.  (Note  that  the  set  S 
is  suppressed  in  the  conditioning  set  in  the  independence  relations  in  the 
resulting  independence  model.)  The  letter  S  is  used  because  Selection  effects 
represent  one  context  in  which  conditioning  may  occur. 

Combining  these  definitions  we  obtain: 

3C  =  {  A';  Y\Z\  \  -X,Y  Z  ...  <  •:  3:  (A'  LIT  Z)  Cl  ( S  u  L)  =  0}. 

Proposition  4.1  For  an  independence  model  3  over  1'  containing  disjoint 
subsets  Si,  S\ 9.  L\ .  F‘2 . 
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(o  3i: = 3, 


nrf5iuS2 


4.1.1  Example 

Consider  the  following  independence  model: 

3*  =  {({a,  |  {t}),({a,  *},{&}  |  0},  <{&,»},  {«}  1  0),  «<*,&},  {<}  I  »>}- 

In  fact,  3*  C  3rn('D) ,  where  V  is  the  DAG  in  Figure  10(i).  In  this  case: 

3i;„  =  {«»,■),  W  I  0),  (ft »}.{«}  I  «>},  =  {(ftft.ftrf  I  0)}- 

4.2  Marginalizing  and  Conditioning  for  Ancestral  Graphs 

Given  an  ancestral  graph  Q  with  vertex  set  V,  for  arbitrary  disjoint  sets  S, 

L  (both  possibly  empty)  we  now  define  a  transformation: 

Q^g[[ 

The  main  result  of  this  section  will  be: 

Theorem  4.18  If  Q  is  an  ancestral  graph  over  V,  and  SuL  C  V,  then 


3.„(5)E  =  3„.(S[p 


(where  AuB  denotes  the  disjoint  union  of  A  and  B). 

In  words,  the  independence  model  corresponding  to  the  transformed  graph 
is  the  independence  model  obtained  by  marginalizing  and  conditioning  the 
independence  model  of  the  original  graph. 

Though  we  define  this  transformation  for  any  ancestral  graph  Q,  our  pri¬ 
mary  motivation  is  the  case  in  which  Q  is  a  DAG,  representing  some  data 
generating  process  that  is  partially  observed  (corresponding  to  marginaliza¬ 
tion)  and  where  selection  effects  may  be  present  (corresponding  to  condition¬ 
ing).  See  Cox  and  Wermuth  (1996)  for  further  discussion  of  data-generating 
processes,  marginalizing  and  conditioning. 
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4.2.1  Definition  of  Q0l 

Graph  G0L  has  vertex  set  V  \  (SOL),  and  edges  specified  as  follows: 
If  q.  3,  are  s.t.  OZ,  with  Z  C  V  \  {SuLo{a,  3}), 


({a},{3}\Z0S)^3m(Q), 


a  €  antff({i}  U  5):  3  G  antgj({a}  U  S ) 

'  a- 3  ' 

a  antg({,<3}  U  S );  3  £  ant g{{a}  U  S) 

>  then  < 

(X  i —  f3 

a  £  antg({/?}  U  S);  3  $  antg({a}  U  S) 

a  -»  3 

t  a  ant g({(3}  U  S);  j3  £  antg({a}  U  S) 

In  words,  Q[bL  is  a  graph  containing  the  vertices  that  are  not  in  S  or  L.  Two 
vertices  a,  3  are  adjacent  in  Q[SL  if  a  and  3  are  m-connected  in  Q  given  any 
subset  that  contains  all  vertices  in  S  and  no  vertices  in  L.  If  a  and  3  are 
adjacent  in  Q[b  then  there  is  an  arrowhead  at  a  if  and  only  if  a  is  not  anterior 
to  either  3  or  S  in  Q,  and  a  tail  otherwise. 

Note  that  if  Q  is  not  maximal  then  ^  Q.  (See  Corollary  5.2.)  We  will 
show  in  Corollary  4.19  that  Q[SL  is  always  maximal. 

4.2.2  Examples 

Consider  the  DAG,  V,  shown  in  Figure  10(i).  The  independence  model 
3m(V)  D  3*,  given  in  Section  4.1.1.  Suppose  that  we  set  L  =  {t},  5  =  0. 


a  b  a  b  a  b 


(i)  (ii)  (iii) 


Figure  10:  (i)  A  simple  DAG  model.  V:  (ii)  the  graph  ©[A:  (iii)  the  graph 
V\]!: .  See  text  for  further  explanation. 

First  consider  the  adjacencies  that  will  be  present  in  the  transformed  graph 
£>[)),.  It  follows  directly  from  the  definition  that  vertices  that  are  adjacent 


in  the  original  graph  will  also  be  adjacent  in  the  transformed  graph,  if  they 
are  present  in  the  new  graph,  since  adjacent  vertices  are  rn-connected  given 
any  subset  of  the  remaining  vertices.  Hence  the  pairs  (a,  x)  and  (b,  y)  will 
be  adjacent  in  X?[®  .  In  addition,  x  and  y  will  be  adjacent  since  any  set 
m-separating  x  and  y  in  V  contains  t.  hence  there  is  no  set  Z  C  {a,  b}  such 
that  ({ x},{y }  |  Z)  e  3m{T>).  Since  {{a},  {b,  y}  |  0),  {{6},  {a,  a;}  |  0)  €  3m(V) 
there  are  no  other  adjacencies.  It  remains  to  determine  the  type  of  these 
three  edges  in  V [®t}.  Since  x  ^  antp(-y),  and  y  tfz  &ntv(x).  the  edge  between 
x  and  y  is  of  the  form  x  y.  Similarly  the  other  edges  are  a  — y  x  and 
b  — >  y.  Thus  the  graph  P[®4}  is  as  shown  in  Figure  10(ii).  Observe  that 

y[;„c:u  !>[;„). 

Now  suppose  that  L  =  0,  S'  =  {t}.  Since  {{ a,x},{b,y }  |  {t})  G  3m(V), 
it  follows  that  (a,  x)  and  ( b ,  y)  are  the  only  pairs  of  adjacent  vertices  present 
in  the  transformed  graph  D[*t},  hence  this  graph  takes  the  form  shown  in 
Figure  10(iii).  Note  that  C  3m(X>[^'}). 

Another  example  of  this  transformation  is  given  in  Figure  11,  with  a  more 
complex  DAG  V .  Note  the  edge  between  a  and  c  that  is  present  in 


(iii)  (iv) 


Figure  11:  (i)  Another  DAG,  V;  (ii)  the  graph  (iii)  the  graph  h} 

(iv)  the  graph 
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4.2.3  Adjacencies  in  G\0  and  inducing  paths 

A  path  7 r  between  a  and  0  on  which  every  collider  is  an  ancestor  of  {a,  /?}  U 5 
and  every  non-collider  is  in  L,  is  called  an  inducing  path  with  respect  to  S  and 
L.  This  is  a  generalization  of  the  definition  introduced  by  Verma  and  Pearl 
(1990).  An  inducing  path  with  respect  to  S  =  0.L  =  0  is  called  primitive. 
Note  that  if  a,0  G  V  \  ( S  U  L),  and  a,  /3  are  adjacent  in  Q  then  the  edge 
joining  a  and  0  is  (trivially)  an  inducing  path  w.r.t.  S  and  L  in  Q. 

In  Figure  10(i)  the  path  x  e-  t  -»  y  forms  an  inducing  path  w.r.t.  5  =  0, 
L  =  {f};  in  Figure  1 1  (i)  the  path  a  — /i  — »  b  <—  l2  — >  c  forms  an  inducing 
path  w.r.t.  S  =  {s},  L  =  {I1J2}',  in  Figure  9(a),  7  f>  /?  f>  a  44  (5  forms 
a  primitive  inducing  path  between  7  and  5.  (Other  inducing  paths  are  also 
present  in  these  graphs.) 

Theorem  4.2  If  Q  is  an  ancestral  graph,  with  vertex  set  V  =  OuSuL,  and 
a,  0  G  O  then  the  following  six  conditions  are  equivalent: 

(i)  There  is  an  edge  between  a  and  8  in 

(ii)  There  is  an  inducing  path  between  a  and  3  w.r.t.  S  and  L  in  Q. 


(iii)  There  is  a  path  between  a  and  8  in  (f?ant({a,/3}us))a  on  which  every 
vertex,  except  the  endpoints,  is  in  L. 


(iv)  The  vertices  in  ant({ai,  8}  U  S)  that  are  not  in  Lll{a,0}  do  not  m- 
separate  a  and  0  in  Q : 

{a>!  {,^}|  ant  ({a,  0}uS)  \  (Lu{a,  0}))  $  3m(Q). 


(v)  VZ,  ZCV\  (SuLu{a,0}),  ^{n},  {0}\Z  U  S)  t  3m(G). 

(vi)  MZ,Z  CV\{S\JLu{a,0}),({a},{0}\z)  <£  3m(G)[sL. 

Proof:  Let  Z*  —  ant({a,  0}  U  S)  \  (L  U  {a,  0}).  By  Proposition  2.2(i) 


ant({a,  0}  U  Z*)  =  ant({a,  0}  U  (ant({o:,  0}  U  S)  \  ( L  U  {a,  ,5}))) 

=  ant(  ant({o./3}  U  5)  \  L) 

—  ant({a,  0}  U  S).  (f) 

In  addition,  let  T*  =  ant({a.  0}  U  S)  fl  (L  U  {a,  0}),  so 

T*  U  Z*  =  ant  ({ o,  d}  u/'i  it) 
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(iii)«4>(iv)  Since,  by  Theorem  3.18,  3m-{Q)  —  3m(G),  (iv)  holds  if  and  only  if  there 
is  a  path  p.  in  ((7ant({a,/3}uz*})a  on  which  no  vertex  is  in  Z* ,  and  hence  by 
(|)  every  vertex  is  in  T*.  Further,  by  (f),  ^ant({Q,/3}uz*)  =  Gant({a,p}us), 
hence  by  the  definition  of  T* ,  p  satisfies  the  conditions  given  in  (iii). 

(ii) =>(iv)  If  there  is  an  inducing  path  7r  in  Q  w.r.t.  S  and  L,  then  no  non-collider 

on  7r  is  in  Z*,  since  Z*  n  L  —  0,  and  any  collider  on  7r  is  in  an({a,  f3}  U 
S)  C  ant  ({a,  ,8}  U  5)  =  ant  ({ct,/?}  U  Z*)  by  (f).  Hence  by  Corollary 
3.15  there  is  a  path  tv*  which  rn-connects  a  and  ,5  given  Z*  in  Q  as 
required. 

(iv) =>(ii)  Let  v  be  a  path  which  m-connects  a  and  8  given  Z* .  By  Lemma  3.13 

and  (f).  every  vertex  on  u  is  in  ant({a,  /3}u5),  hence  by  Lemma  3.2(b) 
and  Corollary  3.3,  every  collider  is  in  an({a,  dj-US1).  Every  non-collider 
is  in  ant({a,  j3}  U  5)  \  Z*  C  L  U  {a.B},  so  every  non-collider  is  in  L. 
Hence  u  is  an  inducing  path  w.r.t.  S  and  L  in  Q. 

(iii) =>(v)  Every  edge  present  in  (£/ant({a,;3}us))G  is  also  present  in  (^ant({a,/3}uzus))a- 

The  implication  then  follows  since  every  non-endpoint  vertex  on  the 
path  is  in  L. 

(v) =»(iv)  This  follows  trivially  taking  Z  =  Z*  \S. 

(v)<=>(i)  Definition  of  Q[SL. 

(v)^(vi)  Definition  of  3m(G)[SL- 


□ 


An  important  consequence  of  condition  (iv)  in  this  Theorem  is  that  a 
single  test  of  m-separation  in  Q  is  sufficient  to  determine  whether  or  not  a 
given  adjacency  is  present  in  G[SL;  it  is  not  necessary  to  test  every  subset  of 
V  \  (SuL  U  {a,  /?}).  Likewise  properties  (ii)  and  (iii)  provide  conditions  that 
can  be  tested  in  polynomial  time. 


4.2.4  Primitive  inducing  paths  and  maximality 

Corollary  4.3  If  Q  is  an  ancestral  graph ,  then  there  is  no  set  Z.  (a,  [3  £  Z), 
such  that  {{a}.  {/3}  |  Z)  €  3m(G)  lf  and  only  if  there  is  a  primitive  inducing 
path  between  a  and  8  in  Q. 

Proof:  The  result  follows  from  (ii)  (v)  in  Theorem  4.2  with  S  =  0,  L  =  0. 
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Corollary  4.4  Every  non-maximal  ancestral  graph  contains  a  primitive  in¬ 
ducing  path  between  a  pair  of  non-adjacent  vertices. 

Proof:  Immediate  by  the  definition  of  maximality  and  Corollary  4.3.  □ 

Primitive  inducing  paths  with  more  than  one  edge  take  a  very  special 
form,  as  described  in  the  next  Lemma,  and  illustrated  by  the  inducing  path 
7  •H-  8  «-»  a  6  in  Figure  9(a). 

Lemma  4.5  Let  G  be  an  ancestral  graph.  If  it  is  a  primitive  inducing  path 
between  a  and  {3  in  Q ,  and  tv  contains  more  than  one  edge,  then: 

(i)  every  non-endpoint  vertex  on  7r  is  a  collider  and  in  ant g({a,/3}); 

(ii)  a  (f  ant g(/3)  and  8  ante(o:); 

(iii)  every  edge  on  n  is  bi-directed. 

Proof:  (i)  is  a  direct  consequence  of  the  definition  of  a  primitive  inducing 
path.  Consider  the  vertex  7  which  is  adjacent  to  a  on  tv.  By  (i),  7  is  a  collider 
on  7r,  so  7  G  spg(ft)  U  ch g(a),  so  7  ^  ante (ct)  as  Q  is  ancestral.  Hence  by  (i) 
7  G  ant g(j3).  If  B  G  antg(a)  then  7  G  antg(a).  but  this  is  a  contradiction. 
Thus  8  (f  antg(ct).  By  a  similar  argument  a  i  antg(/3),  establishing  (ii).  (iii) 
follows  directly  from  (i)  and  (ii),  since  G  is  ancestral.  □ 

Lemma  4.5  (ii)  has  the  following  consequence: 

Corollary  4.6  In  a  maximal  ancestral  graph  G ,  if  there  is  a  primitive  in¬ 
ducing  path  between  a  and  8  containing  more  than  one  edge,  then  there  is 
an  edge  a  8  in  G- 

Proof:  Since  G  is  maximal,  by  Corollary  4.3,  a  and  8  are  adjacent  in  G ■  By 
Lemma  4.5(ii),  a  ^  ant g(8)  and  8  4-  am,; so  :.  hence  by  Lemma  3.9,  it  follows 
that  a  <->  8  in  G  ■  □ 

Note  that  if  G  is  a  maximal  ancestral  graph  and  G'  is  a  subgraph  formed 
by  removing  an  undirected  or  directed  edge  from  G  then  G'  is  also  maximal. 
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4.2.5  Anterior  relations  in  £?[* 

The  next  Lemma  characterizes  the  vertices  anterior  to  a  in  Q  [j. 

Lemma  4.7  For  an  ancestral  graph  Q  with  vertex  set  V  =  OCiSuL,  if  a  £  O 
then 

antg(a)  \  ^antg(5)  U  L j  C  antg-s  (a)  C  ant^({a}  U  S)\  (^S  U  Lj . 

In  words,  if  a,/3  are  in  £/[*  and  a  is  anterior  to  0  but  not  S  in  Q,  then  a 
is  also  anterior  to  0  in  Q[SL.  Conversely,  if  a  is  anterior  to  0  in  Q[SL  then  a  is 
anterior  to  either  0  or  S  in  Q . 

Proof:  Let  fi  be  an  anterior  path  from  a  vertex  0  £  antg(o:)  \(L  U  antg(S')) 
to  a  in  Note  that  no  vertex  on  pi  is  in  S.  Consider  the  subsequence 
(0  =  uim . ...  ,u)i, ...  ,loi  =  a)  of  vertices  on  pi  that  are  in  V  \  (S  U  L).  Now 
the  subpath  pi(ejl+i,uil)  is  an  anterior  path  on  which  every  vertex  except 
the  endpoints  is  in  L.  Hence  cu*  and  Wj+1  are  adjacent  in  Q[SL.  Further 
since  uii+i  £  antg  (w,)  it  follows  that  either  Wj+i  —  nij  or  uoi+i  — »  uoi,  hence 
0=LOm  £  antgrs(o),  as  required. 

To  prove  the  second  assertion,  let  v  =  (<pn, . . .  .<f>\  =  a)  be  an  anterior 
path  from  a  vertex  (pn  £  antgrs(a)  to  a  in  Q[SL.  For  1  <  i  <  n,  either 
4>i+i  —  4>i  or  <j}i+ 1  — »■  (pi  on  v.  By  definition  of  Q[SL,  in  either  case  (pi+l  £ 
ant g({<pi}  U  S)  \  (S  U  L).  Thus  d>n  £  antg({a}  U  S)  \  ( S  U  L).  □ 

Taking  S  =  0  in  Lemma  4.7  we  obtain  the  following: 

Corollary  4.8  In  an  ancestral  graph  Q  —  (V,  E)  if  a  £  V\L  then  antg(a)  \ 
L  =  antgj»  (a). 


4.2.6  The  undirected  subgraph  of  Q[SL 

Lemma  4.9  If  Q  is  an  ancestral  graph  with  vertex  set  V  =  OUSUL,  then 
^ung  U  antg(S)^  \  (S  U  L)  C  ungrs. 


In  words.  any  vertex  in  the  undirected  subgraph  of  Q  which  is  also  present 
in  Q[sl  will  also  be  in  the  undirected  subgraph  of  Likewise  any  vertex 
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anterior  to  S  in  Q  will  be  in  the  undirected  component  of  Q[SL  if  present  in 
this  graph. 

Proof:  Suppose  for  a  contradiction  that  a  £  (ung  U  antg(S))  \  (. S  U  L ),  but 
a  4  un  r,s .  Hence  there  is  a  vertex  8  such  that  either  0  o  a  or  8  — >  a  in 
Q[sl.  In  both  cases  a  4  &ntg({0}  U  S).  Thus  a  4  antg(S').  Since  a  and  8 
are  adjacent  in  Q[SL  by  Theorem  4.2(ii)  there  is  an  inducing  path  rr  between 
a  and  0  w.r.t.  S  and  L,  hence  every  vertex  on  7T  is  in  ant g({a,8}  U  S).  If 
there  are  no  colliders  on  7r  then  since  a  £  uns,  -k  is  an  anterior  path  from  a 
to  8  so  a  £  antg(i3),  which  is  a  contradiction.  If  there  is  a  collider  on  7r  then 
let  7  be  the  collider  on  7r  closest  to  a.  Now  7r(a,y)  is  an  anterior  path  from 
a  to  7  so  a  £  antg(7)  but  7  4  une.  hence  by  Lemma  3.8(ii),  7  4  antg(a). 
Thus  7  £  antg({/3}  U  S ),  and  thus  a  £  antg({/3}  U  S ),  again  a  contradiction. 

□ 

Corollary  4.10  If  Q  is  an  ancestral  graph  with  V  =  OuSCiL  and  a  £  O 
then 

antg(a;)  \  (5  U  L)  C  un^s  U  antgjs(a:). 

Thus  the  vertices  anterior  to  a  £  Q  that  are  also  in  Q[SL  either  remain 
anterior  to  a  £  Q[SL,  or  are  in  un^s  (or  both). 

Proof: 

(antg(a))  \  (S  U  L)  C  ^antg(a)  \  (antg(5)  U  L)  ju^antg(S')  \  (S  U  L)^J 
(*)  C  antg.r|(o!)  U  ungjs 

The  step  marked  (*)  follows  from  Lemmas  4.7  and  4.9.  □ 

Lemma  4.11  In  an  ancestral  graph  Q ,  if  a  £  antg[s(/3)  and  a  4  ungrs  then 
a  £  an g{8),  and  a  4  antg(5). 

Proof:  If  a  4.  uiig-s,  but  a  £  V  \  (S  U  L)  then  by  Lemma  4.9.  a  4 
ung  U  antg(S).  Since  a  £  antgrs(/3)  it  follows  from  Lemma  4.7  that  a  G 
antg({/?}  U  S ).  So  a  £  antg(/3).  Further,  since  a  4  ung,  by  Lemma  3.8(iii), 
a  £  an g(0).  □ 

Consequently,  if  in  Q[SL  a  is  anterior  to  8  and  there  is  an  arrowhead  at  a 
then  a  is  an  ancestor  of  8  in  Q. 
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4.2.7  G[l  is  an  ancestral  graph 

Theorem  4.12  If  Q  is  an  arbitrary  ancestral  graph,  with  vertex  set  V  = 
OiiSUL,  then  £?[*  is  an  ancestral  graph. 

Proof:  Clearly  Q[)  is  a  mixed  graph.  Suppose  for  a  contradiction  that 
a  €  ant£.rs(pagjs(a)Uspg;js(a)).  Suppose  a  €  antgrs(d)  with  8  £  pagrs(a)U 
speg(a).  Then  by  Lemma 4.7.  a  £  antg({,d}uS).  However  if  ,8  £  pagrs(a)U 
spa[s(a)  then  a  ^  antg(^U5)  by  definition  of  G]fL,  which  is  a  contradiction. 
Hence  Q[SL  satisfies  condition  (i)  for  an  ancestral  graph. 

Now  suppose  that  neg^n)  ^  0.  Let  (8  £  neers(o).  Then  by  the  definition 
of  Q[sl,  a  €  ajitg{{8}  U  S )  and  8  £  antg({a}  U  S).  Thus  either  a  £  antg(S') 
or,  by  Lemma  3.8(ii),  a  £  ung.  It  follows  by  Lemma  4.9  that  a  £  un5-s. 
hence  pa^rs  (o)  Uspgrs(o)  =  0.  So  G[SL  satisfies  condition  (ii)  for  an  ancestral 
graph.  □ 

We  will  show  in  Section  4.2.10  that  Q[SL  is  a  maximal  ancestral  graph. 

4.2.8  Introduction  of  undirected  and  bi-directed  edges 

As  stated  earlier,  we  are  particularly  interested  in  considering  the  transfor¬ 
mation  Q  i-»  ^  [*  in  the  case  where  Q  is  a  DAG,  and  hence  contains  no 
bi-directed  or  undirected  edges.  The  following  results  show  that  the  intro¬ 
duction  of  undirected  edges  is  naturally  associated  with  conditioning,  while 
bi-directed  are  associated  with  marginalizing. 

Proposition  4.13  If  Q  is  an  ancestral  graph  which  contains  no  undirected 
edges,  then  neither  does  G[0L. 

Proof:  If  a  — 8  in  G{L  then,  by  construction,  a  £  ant g(8),  8  €  antg(ct). 
Hence  by  Lemma  3.8(ii)  there  is  a  path  composed  of  undirected  edges  which 
joins  a  and  8  in  Q-,  which  is  a  contradiction.  □ 

In  particular,  if  we  begin  with  a  DAG,  then  undirected  edges  will  only  be 
present  in  the  transformed  graph  if  S  yi  0:  likewise  it  follows  from  the  next 
Proposition  that  bi-directed  edges  will  only  be  present  if  L  ^  0. 
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Proposition  4.14  if  Q  is  an  ancestral  graph  which  contains  no  bi-directed 
edges  then  neither  does  C/[g. 

Proof:  If  a  (3  in  Q[^  then  a  antg({/?}u5)  and  j3  ^  antg({a:}uS).  Since 
there  are  no  bi-directed  edges  in  Q  it  follows  that  a  and  /?  are  not  adjacent 
in  Q.  Since  L  =  0,  it  further  follows  that  any  inducing  path  has  the  form 
a  — >  a  <r-  j3,  where  a  G  antg(S),  contradicting  a,  3  $  antg(5).  □ 

4.2.9  The  independence  model  3m(Q[sL ) 

The  following  Lemmas  and  Corollary  are  required  to  prove  Theorem  4.18. 

Lemma  4.15  IfQ  is  an  ancestral  graph  with  V  =  OGSuL,  and  8  G  pa^r^ajU 
sp  g[|(a)  then  a  is  not  anterior  to  any  vertex  on  an  inducing  path  (w.r.t.  S 
and  L)  between  a  and  (3  in  Q. 

Proof:  If  8  G  pa^rs  (a)Uspgfs  (a),  then  a  $  un^g.  It  then  follows  by  Lemma 

4.9  that  a  f  ung,  and  by  construction  of  Q[SL  that  a  ^  antg({/?}  U  S).  A 
vertex  7  on  an  inducing  path  between  a  and  (3  is  in  ant g({a,  8}  U  S ).  If 
a  G  antg(y)  then  by  Lemma  3.8(h)  7  ^  antg(a),  since  a  fz  urig.  Thus 
7  G  ante; ({4}  U  S)  but  then  a  G  ant g({/3}  U  S ),  which  is  a  contradiction.  □ 

Corollary  4.16  If  a  8  ora  <—  / 3  in  Q[SL  and  (a,  <f>i, . . . ,  4>k,  ft)  is  an 
inducing  path  (w.r.t.  S  and  L)  in  Q  then  6\  G  pa^(a)  Uspe(«). 

Proof:  By  Lemma  4.15  a  ^  antg^),  hence  (f>i  G  pa^a)  Uspg(a:).  □ 

The  next  Lemma  forms  the  core  of  the  proof  of  Theorem  4.18. 

Lemma  4.17  IfQ  is  an  ancestral  graph  with  V  =  OuS'JL,  Zu{a,  8}  C  O 
then  the  following  are  equivalent: 

(i)  There  is  an  edge  between  a  and  j3  in  ^(^[f)ants.s({a,^}uz)  j  • 

(ii)  There  is  a  path  between  a  and  3  in  ( Q!Ultg ({a,0}uzuS) )  on  which  every 
vertex,  except  the  endpoints,  is  in  L. 

(iii)  There  is  a  path  which  m-connects  a  and  13  in  Q  given 

ante; ({a,  ft}UZuS)\  {L  U  {a,  ,8}). 
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Figure  12:  Example  of  Lemma  4.17:  (i)  an  ancestral  graph  Q\  (ii) 

the  augmented  graph  (£ant5({a,/3}uzu,s))a;  (hi)  the  graph  Q[SL;  (iv)  the  aug¬ 
mented  graph  ((t/[i)ante[Sf{aj}uz))a;  (where  Z  =  {C},  S  =  {s}  and  L  = 
{h,h,h,h,h})-  See  text  for  further  explanation. 

Figure  12  gives  an  example  of  this  Lemma,  continued  below,  to  illustrate 
the  constructions  used  in  two  of  the  following  proofs. 

Proof: 

(i)=>(ii)  By  (i)  there  is  a  path  tx  between  a  and  8  in  Q[bL  on  which  every  non¬ 
endpoint  vertex  is  a  collider  and  an  ancestor  of  Z  U  {a,/3}  in  Q{SL.  Let 
the  vertices  on  tv  be  denoted  by  (u0, . . . ,  wn+1),  (a  =  ui0,  8  —  u)n+ 1). 
By  Lemma  4.7  oJi  £  ant g({a,/3}  U  Z  U  S).  By  Theorem  4.2  there 
is  a  path  vt  between  and  ui+l  in  ^ant({«ji,a«i+1}us)  on  which  every 
non-collider  is  in  L.  The  path  Ui  exists  in  ^ant({a,^}uzus)  as  it  is  a 
supergraph  of  (?ant({aJi,a;i+i}us)-  Let  s  be  the  sequence  of  vertices  formed 
by  concatenating  the  sequences  of  vertices  on  each  of  the  paths  u,. 
(The  same  vertex  may  occur  more  than  once  in  s.)  Let  (vpi, . . . ,  dr) 
be  the  subsequence  of  vertices  in  s  each  of  which  is  a  non-collider  on 
some  path  i and  let  ipo  =  n.ipr+1  =  8-  Since  vpi, . . .  .'ipr  £  L,  it  is 
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sufficient  to  show  that  for  0  <j  <r+l,  if  ip j  ipj+\  then  ipj  —  ip]+\  in 

(£antC{a,/3}uzuS))Q-  Suppose  ipj  ±  ipj+i,  there  are  now  two  cases: 

(a)  ipj  and  ipj+i  both  occur  on  the  same  path  17.  In  this  case  ipj  and 
ipj+ 1  are  connected  in  (<?ant({a,j}uzus))a  by  the  augmented  edge 
corresponding  to  the  collider  path  Ui(ipj,ipj+i). 

(b)  ipj  and  ipj+i  occur  on  different  paths,  utj  and  17. +1.  Consider  the 

subsequence  s(ipj,ipj+ 1),  denoted  by  {<p0,  Pi,  ■  ■  ■ >  <Pq,  <Pq+\)>  with 
<p o  =  ipj ,  4>q+ 1  =  Wj+i ■  For  1  <k<q  any  vertex  (pk  is  either  on  17  or  is 
an  endpoint  of  ut  with  ij  <  i  <  ij+\-  In  the  former  case  since  ipj 
and  ipj+i  are  consecutive  non-colliders  in  s,  (pk  is  a  collider  on  17. 
In  the  latter  case  by  Corollary  4.16,  <pk-i,  <pk+\  €  pa£;(u;i)Uspg(a:i) 
since  w,  is  a  collider  on  tv.  Thus  for  1  <  k  <  q,  (pk  (pk+i,  more¬ 
over  ipj  — >  <pi  or  ■ ipj  -H-  (pi,  and  (pq  <—  ipj+i  or  (pq  -H-  ipj+\.  Hence 
i ipj  and  ipj+ 1  are  collider  connected  in  Qa,nt({a,p}uzuS)i  and  conse¬ 
quently  adjacent  in  (£ant({a)(s}uzus))“-  □ 

Applying  the  construction  in  the  previous  proof  to  the  example  in  Figure 
12,  we  have  7r  =  (a,  £,/?)  =  {wo,u,’i,u;2)  in  Q[SL,  hence  n  —  1.  Further,  u0  = 
(a,  7,  lu  l2,  k,  C)  and  ux  =  (C,  k,  l2,  k,  k,  h,  ,8),  hence  s  =  (a,  7.  lu  l2,  k,  (,  k,  k, 
h,h,k,8)-  Now,  {ipo,  ■  ■  • ,  ipf)  =  {a,li,h,k,kJ2Juk,k,/3),  so  r  =  8.  For 
j  7^3,  case  (a)  applies  since  ipj  and  ipj+i  occur  on  the  same  path  up.  for  j  —  3, 

■ipj  =  ipj 

(ii) <t7(iii)  This  follows  from  Proposition  2.2  together  with  the  definition  and 

equivalence  of  m-separation  and  m*-separation  (Theorem  3.18). 

(iii) =4-(i)  Let  Z*  =  antg({a,  j3}  U  ZuS)  \  (Lu  {a,  /?}),  and  let  n  be  a  path  which 

m-connects  a  and  8  given  Z*  in  Q.  By  Lemma  3.13  every  non-collider 
on  7r  is  in  antg({ct:, 8}  U  Z*)  =  antg({a:, 8}  UZU5)  by  Propositions 
2.1  (iii)  and  2.2(i).  Every  non-collider  on  re  is  in  L  and  every  collider  is 
an  ancestor  of  Z*.  Let  (ipx, ....  ipt)  denote  the  sequence  of  colliders  on 
7r  that  are  not  in  antg(S'),  and  let  ipo  =  a  and  ipt+i  =fi.  For  1  <  i  <  t 
let  <p{  be  the  first  vertex  in  O  on  a  shortest  directed  path  from  ipi  to  a 
vertex  Q  e  Z*  \  ante; (5)  C  ant g(Z  U  {a ,8})  \  (ant g(S)  U  L ),  denoted 
Ui.  Again  let  =  a,  <pt+i  —  3.  Denote  the  sequence  {<p0, . . . ,  <pt+i)  by 
t.  Finally,  let  s  be  a  subsequence  of  t  constructed  as  follows: 

i( 0)  =  0,  so  (ppQ)  =  a ; 
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i(k  +  1)  is  the  greatest  j  >  i(k )  with  {< f>i(k ), ...  C  antg({<^(fc),  cf)j}). 

Note  that  if  i(k )  <  t  then  i(k  +  1)  is  guaranteed  to  exist  since 

{$*»(&)  j  0t(fc)+i }  antg  ( { < i>i(k ) )  0t(A;)+i } )  • 

In  addition,  the  vertices  in  s  are  distinct.  Let  s  be  such  that  i(s  +  1)  = 

t  +  1,  SO  (j)i{s+\)  —  ft- 

We  now  show  that  there  is  a  path  connecting  4>^k)  and  ^(fc+i)  in 
(^ants({^(jt),^t(fc+1)}us))a  011  which  every  vertex  except  the  endpoints  is  in 
L:  and  V’i(fc)  are  connected  by  the  path  corresponding  to  u^k)  in 

(£ants({&(fc) ,<pi(k+1)}us))a ,  and  likewise  4>i{k+ 1}  and  ipi(k+i)  are  connected  by 
the  path  corresponding  to  i/^k+xy  In  addition,  excepting  the  endpoints 
^(fc)  and  4>i(k+i),  every  vertex  on  and  v^k+i)  is  in  L.  By  construc¬ 
tion,  every  collider  on  n(i/ji(k),  ^(jt+i))  is  either  in  ante({<^(fc),  (pi(k+i)}) 
or  antg(5).  Further,  every  non-collider  7  on  'n{ipi{k)-l'ipi{k+\))  is  ei¬ 
ther  anterior  to  ipj  ( i(k )  <  j  <  i(k  +  1))  or  is  anterior  to  a  col¬ 
lider  that  is  in  antg(5).  Thus  every  vertex  on  Tc{ipi(k),ipi(k+i))  is  in 
ante ({&<*),  d>i(fc+i)}u5),  so  this  path  exists  in  (7ant0({&(jk),*i(*+1)}us)-  The 
sequence  of  non-colliders  on  tt (k)ykyipyk+y  ) ,  all  of  which  are  in  L ,  con¬ 
nect  V’i(fc)  and  ipi{k+ 1)  in  (^ante({0i{fc),^+l)}us))a-  It  now  follows  from 
Theorem  4.2  (iii)T^(i)  that  4>i{k)  and  4>i(k-¥  1)  are  adjacent  in  Q[SL. 

Next  we  show  that  d>0  — >  or  <fi0  44  4>i{ i)>  <f>i(s+ 1)  or 

0t(s)  (f>i(s+ 1)  and  1  <  k  <  s,  44  <f>i(k+ 1)  in  Q[SL,  from  which 

it  follows  that  a  and  ,5  are  collider  connected  as  required.  By  con¬ 
struction  <&(*)}  C  ant g({<f>i{k-i),(f>i(k)}),  hence  if  (j)i{k)  £ 

ants  ({@i(h—i) } J  then  {(f>i(k— 1) j  •  •  •  >  i  4H{k )  -i- 1 }  ^  ant^  ( {oTa:-- 7 ,  ••  1 } )  • 

and  thus  i(A;)  is  not  the  greatest  j  such  that 

1)>  •  •  •  •  T  ante({0j(^_i),  db})- 
Thus  <£  antg({0j(A;_i)}),  (1  <  /■:  <  s).  Further,  since 
{0j(fc)i  ■  ■  •  >  &i{k+ 1) }  antt7(  { (->.(k).  dhffc+i)})} 
if  ^i(fc)  £  antff({^(fc+i)})  then 

I):  •  •  ■  antg 1)>  }); 
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but  in  that  case  is  not  the  last  such  vertex  after  q>i(k- 1)  in  t,  which  is 
a  contradiction.  By  construction,  ip^  £  antg^pt))  for  1  <  A:  <  s,  and 
ipi(k)  antg(S),  so  6^)  £  antg(S).  We  have  now  shown  that  (j)^)  $ 
antgd^^-!), (/>i(k+i)}  U  S),  for  1  <  k  <  s.  The  required  orientations 
now  follow  from  the  definition  of  f7[*. 

Finally,  since  {(j>i(i),  •  ■  ■  ,<fii(s)}  ^  ant  g(Z  U  {a,  8})  \  (antg(S)  U  L), 
it  follows  by  Lemma  4.7  that  {d>i(\), .  ■  ■ ,  ^i(s)}  Q  ant  g:s(Z  U  {a,  /?}). 
Hence  every  vertex  in  the  sequence  s  occurs  in  (^[^)ante[S({a,/9}uz)>  and 
thus  a  and  /3  are  collider  connected  in  this  graph,  as  required.  □ 

We  now  apply  the  construction  in  the  previous  proof  to  the  example 
in  Figure  12.  The  path  7r  =  {a,  7,  l\,  I3 ,l5,  j3)  m-connects  a  and  /3  given 
Z*  =  antg({o:,  f3}  U  Z  U  S)  \  (L  U  {a,  /3})  =  {7,  <5,  s,£}.  It  follows  that 
(W),  VT'02,'03)  =  (<x,h,k,P),  so  t  =  2;  t  =  (00,01,02,03)  =  (a,C,6,p),  vx  = 
(h,k,k,0,  and  ^2  =  It  then  follows  that  s  =  (<Ai(o), 0i(i), ^(2))  = 

(a,  C,P),  so  s  =  l.  For  k  =  0,1  the  graph  (^ante({^(fc)A(fc+1)}us))a  is  the  graph 
shown  in  Figure  12(ii).  Finally,  note  that  t  does  not  constitute  a  collider 
path  between  a  and  /3  in  Q[SL,  though  the  subsequence  s  does,  as  proved. 

We  are  now  ready  to  prove  the  main  result  of  this  section: 

Theorem  4.18  If  Q  is  an  ancestral  graph  over  V,  and  SOL  C  V ,  then 

x.(e)G  =  :we[l) 

Proof:  Let  XUYUZ  C  O.  We  now  argue  as  follows: 

(x,y  \  z)  3mmsL 

o  (X,  Y  \Z\JS)  £  3m{Q) 

For  some  a  £  X,  (3  £  Y  there  is  a  path  7r  connecting  a  and  8 
in  (^antff({a^}uzus))a,  on  which  no  vertex  is  in  Z  U  S. 

(*)  <$■  For  some  a  £  X,  8  €  Y  there  is  a  path  /u  connecting  a  and  8 

in  ((  t?7)ant^.s({«,/3}uz})a  on  which  no  vertex  is  in  Z. 

y^L 

&  (X,Y  |  Z)  (f  3m(g[l) 

37 


The  equivalence  (*)  is  justified  thus: 

Let  the  subsequence  of  vertices  on  7r  that  are  in  O  be  denoted  <  (jJ\  }  .  .  .  j  i^Tl)  • 
Since  uj1. uy+i  G  antg({ar, /?}  UZU  S). 

(l?antg({a„8}UZuS))  (I?antg({u;;,u,’;+i}U({a,;3}UZ)uS} )  ■ 

By  Lemma  4.17,  ui{  and  wi+1  are  adjacent  in 

ml)  ante[s{{wi,o.’i+i}U({o,^}UZ)))  > 

since  any  vertices  occurring  between  uy  and  uy+i  on  7 r  are  in  L.  We  now 
show  by  induction  that  for  1  <  i  <  n,  lo1  G  ant^s  ({a, j. 3 }  U  Z).  Since  uq  =  a, 
the  claim  holds  trivially  for  i  =  1.  Now  suppose  that  uj1  G  antg[s({a:,  /?}  U 
Z ).  If  oy+i  ^  antg(5)  then  by  Lemma  4.7  oq+1  G  antgrs({a:,  f3}  U  Z).  On 
the  other  hand,  if  uq+i  G  antg(S)  then  by  Lemma  4.9,  oq+i  G  un^s.  It 
follows  that  in  Q[SL  either  uq+ 1  —  uq,  Wj+i  — oq,  or  uq+i  — >■  7,  where  7  is  a 
vertex  on  a  collider  path  between  and  w,;+i  in  (^[®)ant5[S({WiM+1}u({a,/3}uz))- 
Consequently,  oq+i  G  ant^s  ({uq,  a,  0}  U  Z)  —  antgjs({of,  /3}  U  Z),  by  the 
induction  hypothesis.  It  nowT  follows  that  for  1  <  i  <  n,  uq  and  uq+i  are 
adjacent  in 

((®  antgi  s({«>/3}uZ))a  —  ((^[t)ants[s({a)i,a)<+i}U({a,^}UZ)))a) 

hence  a  and  /3  are  connected  in  this  graph  by  a  path  on  which  no  vertex  is 
in  Z. 

Conversely,  suppose  that  the  vertices  on  fx  are  (v\ . . . ,  vm).  Since  Vj,  v3+i  G 
antgrs  ({a,  ,5}  U  Z),  by  Lemma  4.7  Vj,  t>J+1  G  &ntg({a,  8}U  ZU  S).  As  Vj  and 
Vj+ 1  are  adjacent  in 

((^L)&“t  S({<*,/?}UZ))  =  ((G[L)i3xitc!s({vi,vi+1}U({a,p}UZ)))  > 

ViL  V’-L 

it  follows  by  Lemma  4.17  that  v3  and  uJ+1  are  connected  by  a  path  u  in 

^I?antg({l^i,t»^-H}u({c^:,,£}}uZ)US),,  ( Qtmtg{{a,0}uZuS )) 

on  which  no  vertex  is  in  Z  U  S.  Hence  a  and  3  are  also  connected  by  such  a 
path.  □ 
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4.2.10  Q[sl  is  a  maximal  ancestral  graph 

Corollary  4.19  If  Q  is  an  ancestral  graph  with  vertex  set  V  —  OuSuL  then 
QfL  is  a  maximal  ancestral  graph. 

Proof:  By  definition  there  is  an  edge  between  a  and  3  in  Q[SL  if  and  only 
if  for  all  sets  Z  C  O  \  {a,  /3},  {{a:} ;  {/3}  |  Z  U  S)  £  3m(G),  or  equiv¬ 
alently  {{a},{/3}  |  Z)  3m(G)[sL-  Hence  by  Theorem  4.18,  there  is  an 
edge  between  a  and  /3  in  Q[SL  if  and  only  if  for  all  sets  Z  C  O  \  {a./3}. 
({a},  {3}  |  Z)  3m(G[sL)-  Hence  Gil  maximal.  □ 


4.2.11  Commutativity 


Theorem  4.20  IfG  is  an  ancestral  graph  with  vertex  set  V,  and  S i,  S2,  Lx,  L2 
are  disjoint  subsets  ofV,  then  G[sLl1^  =  ( G[SL\)[SL22 ■  Hence  the  following  dia¬ 
gram  commutes: 


G 


G 


■Si 

L2 


G 


■Si 

£1 


/jrS1uS2 

^  Ll1ul2 


Figure  11  gives  an  example  of  this  Theorem. 

Proof:  We  first  show  that  Gll^Jf^  and  (G[SL\)[SL22  have  the  same  adjacencies. 

Let  a,  j3  be  vertices  in  V  \  (Si  U  S2U  Lx  U  L2). 

There  is  an  edge  between  a  and  8  in  G^Jl2 

^  VZC  K\((5iU52)U(£,1UL2)u{a,i8}),  {{a},  {,3}  [  ZU(S1US2))  $  3m(G) 
**  VZC  (V\(SiULi))\(S2UL2U{a,P}),  ({a},  {,3}  |  ZuS2)  $  3m(G)[sL\ 
(*)  ^  \/ZC(V\(S1ULi))\(S2UL2U{a,/3}),({a},{/3}\ZuS2)t3m(GisL\) 

There  is  an  edge  between  a  and  / 3  in 


The  equivalence  marked  (*)  follows  from  Theorem  4.18.  Now  suppose  that 
a  and  3  are  adjacent  in  G and  (G  ''3)113 
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a  E  ant^usa  (0) 


=>  a  e  &ntg({0}  U  Si  'U  S2) 

by  Lemma  4.7; 

=>  a  e  ant  „,s!  ({/3}  U  S2)  or  a  e  un  rs3 

ylLi  1 

by  Corollary  4.10 
and  Lemma  4.9; 

=>  ant(S[s,)ts2 (.3)  or  a  e 

by  Corollary  4.10 
and  Lemma  4.9; 

=>  a  G  ant, rrSi ,  rs2  (0) 

Lj  >  L L2 

since  a  and  0  are 
adjacent. 

Arguing  in  the  other  direction, 

a  €  ant.rrslus2  (0)  =4-  a  £  ant  tst  ({3}  U  S2) 

v»L 

=4>  a  E  &nig({0}  U5iU  S2) 

by  Lemma  4.7; 

by  Lemma  4.7; 

=>  a£  ant  ,-s3us2  (0)  or  a  €  un 

^LL1UL2 

g-sjus,  by  Corollary  4.10 

LlUL 2  and  Lemma  4.9; 

— r'  a  £  &ntrrslus2  (6) 

since  a  and  0  are 
adjacent. 

It  then  follows  from  Corollary  3.10  that  =  (£?[))))  [^  as  required.  □ 


5  Extending  an  Ancestral  Graph 

In  this  section  we  prove  two  extension  results.  We  first  show  that  every 
ancestral  graph  can  be  extended  to  a  maximal  ancestral  graph,  as  stated 
in  Section  3.7.  We  then  show  that  every  maximal  ancestral  graph  may  be 
extended  to  a  complete  ancestral  graph,  and  that  the  edge  additions  may 
be  ordered  so  that  all  the  intermediate  graphs  are  also  maximal.  This  latter 
result  parallels  well  known  results  for  decomposable  undirected  graphs  (see 
Lauritzen  (1996),  p.20). 
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5.1  Extension  of  an  Ancestral  Graph  to  a  Maximal 
Ancestral  Graph 

Theorem  5.1  If  Q  is  an  ancestral  graph  then  there  exists  a  unique  maximal 
ancestral  graph  Q  formed  by  adding  bi-directed  edges  to  Q  such  that  3m(G)  = 

3m{Q). 

Figure  13  gives  a  simple  example  of  this  Theorem. 

% 

(i)  (ii) 


Figure  13:  (i)  A  non-maximal  ancestral  graph  Q;  (ii)  the  maximal  extension 
G .  (Every  pair  of  non-adjacent  vertices  in  Q  are  m-separated  either  by  {c} 
or  {d}.) 

Proof:  Let  Q  =  £?[®.  It  follows  from  Theorem  4.18  and  Proposition  4.1  (i) 
that 

3m(Q)  =  3rn(Gll)  =  MG)t  =  3m(g) 

as  required.  If  a  and  ft  are  adjacent  in  Q  then  trivially  there  is  a  path  m- 
connecting  a  and  8  given  any  set  Z  C  V  \  {a,  {3},  hence  there  is  an  edge 
between  a  and  /3  in  Qf .  Now,  by  Corollary  4.8,  antg(a)  —  antcr0(o).  Hence 

by  Lemma  3.9  every  edge  in  Q  is  inherited  by  Q  =  G[l-  By  Corollary  4.19 
G{1  is  maximal.  This  establishes  the  existence  of  a  maximal  extension  of  Q. 

Let  Q  be  a  maximal  supergraph  of  Q.  Suppose  a  and  [i  are  adjacent  in  G 
but  are  not  adjacent  in  Q.  By  Corollary  4.3  there  is  a  primitive  inducing  path 
7 r  between  a  and  8  in  G ,  containing  more  than  one  edge.  Since  n  is  present 
in  G ,  and  this  graph  is  maximal,  it  follows  by  Corollary  4.6  that  a  ft  in 
Q,  as  required.  This  also  establishes  uniqueness  of  Q.  □ 

Three  Corollaries  are  consequences  of  this  result: 

Corollary  5.2  Q  is  a  maximal  ancestral  graph  if  and  only  if  G  =  Gil- 
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Proof:  Follows  directly  from  the  definition  of  G%  and  Theorem  5.1.  □ 

The  next  Corollary  establishes  the  Pairwise  Markov  property  referred  to 
in  Section  3.7. 

Corollary  5.3  If  Q  is  a  maximal  ancestral  graph  and  a,  8  are  not  adjacent 
in  Q,  then  {{a},  {8}  |  antg({a.  ,5})  \  {a,  8})  G  3m(G)- 

Proof:  By  Corollary  5.2,  G  —  G[[-  The  result  then  follows  by  contraposition 
from  Theorem  4.2,  properties  (i)  and  fiv).  □ 

Corollary  5.4  If  G  is  an  ancestral  graph .  a  G  ant g{j3).  and  a,  8  are  not 
adjacent  in  G  then  {{a},  {8}  |  antg({a,  8})  \  {a,  j8})  G  3m(G)- 

Proof:  If  a  G  ant g(/3)  then  by  Corollary  4.8,  a  G  ant^  (8).  Hence  there  is 
no  edge  a  8  in  Gfr.  since  by  Theorem  4.12,  is  ancestral.  It  follows 
from  Theorem  5.1  that  a  and  8  are  n°t  adjacent  in  The  conclusion  then 
follows  from  Corollary  5.3.  □ 

5.2  Extension  of  a  Maximal  Ancestral  Graph  to  a  Com¬ 
plete  Graph 

For  an  ancestral  graph  Q  =  (V,  E),  the  associated  complete  graph ,  denoted 
G.  is  defined  as  follows: 

G  has  vertex  set  V  and  an  edge  between  every  pair  of  distinct  vertices  a,  8, 
specified  as  follows: 

a  —  8  if  a,  8  8:  ung, 

a  -4  8  if  a  G  ung  U  ant g(8)  and  8  £  une; 

a  -H-  8  otherwise. 

Thus  between  each  pair  of  distinct  vertices  in  Q  there  will  be  exactly  one 
edge.  Note  that  although  Q  is  unique  as  defined,  in  general  there  will  be 
other  complete  ancestral  graphs  of  which  a  given  graph  G  is  a  subgraph. 

Lemma  5.5  If  G  —  ( V ,  E)  is  an  ancestral  graph,  then:  (i)  Q  is  a  subgraph 
of  Q;  (ii)  uiig  =  ung;  (iii)  for  all  v  G  V,  ant g(u)  =  antg(z^)  U  ung;  (iv)  Q  is 
an  ancestral  graph. 
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Proof:  (i)  This  follows  from  the  construction  of  Q,  Lemma  3.7,  and  pag(i')  C 
antgj(^). 

(ii)  By  construction,  if  a  G  ung  then  paj(a)  Lisped)  =  0  hence  a  G  un^-. 
Conversely,  if  a  £  un$  then  pag(a)uspg(a)  ^  0.  By  (i),  pa^a^Usp^a)  ^  0, 
so  a  ^  uiig.  Thus  un^-  =  urig  as  required. 

(iii)  By  (i),  ant g(v)  C  antg(i'),  further,  by  construction,  un^  C  antg(z'), 
thus  ant g(v)  U  ung  C  ant g(u).  Conversely,  if  a  G  antg-(r'o)  then  either  a  G 
uiig  =  urig  by  (ii)  or  a  0  une.  In  the  latter  case,  by  construction  of  Q  there 
is  a  directed  path  a  — »  vn  —>•••—>  uq  in  <7,  and  every  vertex  on  the  path 
is  in  V  \  ung.  Hence  a  G  s.ntg(nn),  and  vt  G  ant g(t'i-i)  (i  =  l,...,n),  so 
a  G  antg(r'o). 

(iv)  If  /3  — >■  q  in  <7  then,  by  the  construction  of  Q.  a  ^  ung  and  /?  G 

antg(a)  Uung.  Hence,  by  Lemma  3.8(ii),  a  ^  antg(/3)  and  thus  a  ^  ant g(6), 
again  by  (iii).  Similarly,  if  /?  <-)•  a  in  Q  then  by  construction,  a  ^  une  U 
antg(/3),  hence  by  (iii),  a  ^  ant g(J3).  Thus  a  ^  antg(pag-(o:)  U  spg(o)),  so 
(i)  in  the  definition  of  an  ancestral  graph  holds.  By  the  construction  of  Q,  if 
n Gg(a)  7^  0  then  a  G  ung,  and  thus,  again  by  construction,  sp^(a)Upag(tt)  = 
0,  hence  (ii)  in  the  definition  holds  as  required.  □ 

Theorem  5.6  If  Q  is  a  maximal  ancestral  graph  with  r  pairs  of  vertices  that 
are  not  adjacent,  and  Q*  is  any  complete  supergraph  of  Q  then  there  exists  a 
sequence  of  maximal  ancestral  graphs 

g*  =  gQ,...,gr  =  g 

where  Qi+ 1  is  a  subgraph  of  gt  containing  one  less  edge  e*  than  g.i,  and 
unffi+1  =  unSi. 

The  sequence  of  edges  removed,  (eo, . . . ,  er_x),  is  such  that  no  undirected 
edge  is  removed  after  a  directed  edge  and  no  directed  edge  is  removed  after  a 
bi-directed  edge. 

Two  examples  of  this  Theorem  are  shown  in  Figure  14.  (The  existence 
of  at  least  one  complete  ancestral  supergraph  g*  of  Q  is  guaranteed  by  the 
previous  Lemma.) 

Proof:  Let  E  be  the  set  of  edges  that  are  in  g0  =  g*  but  not  Q.  Place  an 
ordering  -<  on  E  as  follows: 
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a  p 


Y- 


P 


8 


(i-a) 


(i-b)  (i-c)  (i-d) 


(ii-a)  (ii-b)  (ii-c) 


Figure  14:  Two  simple  examples  of  the  extension  described  in  Theorem  5.6. 
In  (ii)  if  the  a  77  /?  edge  were  added  prior  to  the  7  77  <5  edge  the  resulting 
graph  would  not  be  maximal. 

(i)  if  a  —  3, 7  -7  6  E  E  then  a  —  /3  -<  7  — >  5\ 

(ii)  if  a  -7  13, 7  77  6  E  E  then  a  -4  (3  -<  7  77  (5; 

(iii)  if  a  77  (3 , 7  77  <5  €  E  and  a.  j3  E  ane({7,  d})  then  a  /i  -<  7  77  <5; 

The  ordering  on  bi-directed  edges  is  well-defined  by  Lemma  3.11.  Now  let  Gi 
be  the  graph  formed  by  removing  the  first  i  edges  in  E  under  the  ordering 
-<.  Since  Go  is  ancestral,  it  follows  from  Proposition  3.5  that  Gi  is  too.  Since 
Go  is  complete,  it  is  trivially  maximal. 

Suppose  for  a  contradiction  that  Gi  is  maximal,  but  Gi+i  is  not.  Let  the 

endpoints  of  et  be  a  and  3.  Since,  by  hypothesis,  Gi  is  maximal,  for  any 

pair  of  vertices  7 ,5  that  are  not  adjacent  in  Gi ,  for  some  set  Z,  (7 ,8  ^  Z), 
{7 ,8  |  Z)  E  3m(Gi)  C  3m{Gi+i)  (by  Proposition  3.12).  Since  a,  (3  are  the  only 
vertices  that  are  not  adjacent  in  Gi+i,  but  are  adjacent  in  Gi,  it  follows  by 
Corollaries  4.3  and  4.4  that,  there  is  a  primitive  inducing  path  7 r  between  a 
and  3  in  Gi^i  and  hence  also  in  G,- 

By  Corollaryjl.6  it  then  follows  that  e?  =  a  ++  3  in  Gi-  Since  all  di¬ 
rected  edges  in  E  occur  prior  to  e,,  an g(v)  —  an for  all  v  E  V.  By 
Lemma  4.5  every  edge  on  7r  is  bi-directed  and  every  vertex  on  the  path  is  in 
ang  _  ({q.  3})  =  ang({a,  5}';.  It  then  follows  that  7r  exists  in  G  since,  if  any 
edge  on  tt  were  in  E.  it  would  occur  prior  to  e;.  But  in  this  case,  since  G  is 
maximal,  e.  is  present  in  Q.  which  is  a  contradiction. 
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Finally,  by  Proposition  3.6,  ungt.  C  ung.+1,  as  Gi+i  is  a  subgraph  of  Gi- 
By  Lemma  5.5(ii),  ungr.  =  ung  =  un^  =  ung0,  hence  un^  =  ungi+1.  □ 

Note  that  the  proof  shows  that  between  Q  and  any  complete  supergraph 
Go  of  G  there  will  exist  a  sequence  of  maximal  graphs,  each  differing  from 
the  next  by  a  single  edge. 

6  Canonical  Directed  Acyclic  Graphs 

In  this  section  we  showr  that  for  every  maximal  ancestral  graph  Q  there 
exists  a  DAG  V(G)  and  sets  5,  L  such  that  T>(Q)[SL  =  G ■  This  result  is 
important  because  it  shows  that  every  independence  model  represented  by 
an  ancestral  graph  corresponds  to  some  DAG  model  under  marginalizing  and 
conditioning. 

6.1  The  Canonical  DAG  V(Q)  Associated  with  Q 

If  G  is  an  ancestral  graph  with  vertex  set  V.  then  we  define  the  canonical 
DAG ,  T>(Q)  associated  with  G  as  follows: 

(i)  let  Sv{g)  =  {aafj  |  a  —  j3  in  G} 

(ii)  let  Lx>(g)  =  {Aap  |  a  f3  in  Q} 

(iii)  DAG  V(G)  has  vertex  set  V  U  Lv^g)  U  Sx>(g)  and  edge  set  defined  as 
follows: 

f  a  — >■  /?  I  (  a  — y 

If  <  a  <->■  j3  >  in  G  then  <  a  <—  \ap  — r  8 
^  Qi  —  8  J  I  a  — ^  crQ/  j  i —  8 


Figure  15  shows  an  ancestral  graph  and  the  associated  canonical  DAG. 
Wermuth  et  al.  (1994)  introduced  the  idea  of  transforming  a  graph  into  a 
DAG  in  this  way  by  introducing  additional  ‘synthetic’  variables,  as  a  method 
of  interpreting  particular  dependence  models.  (See  also  Verma  and  Pearl, 
1990.) 

A  minipath  is  a  path  in  V{G)  containing  one  or  two  edges,  with  endpoints 
in  V ,  but  no  other  vertices  in  V .  The  construction  of  V(Q)  sets  up  a  one 
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(i) 


x 


abc 


x 


y 


(ii) 


Figure  15:  (i)  An  ancestral  graph:  (ii)  the  associated  canonical  DAG. 

to  one  correspondence  between  edges  in  Q,  and  minipaths  in  T>(Q).  If  a  and 
j3  are  adjacent  in  Q  then  denote  the  corresponding  minipath  in  V{Q),  Sap. 
Conversely  if  3  is  a  minipath  in  T>(Q),  then  let  8s  denote  the  corresponding 
edge  in  Q. 

Observe  that  if  dQg  and  5^  are  minipaths  corresponding  to  two  different 
adjacencies  in  Q,  then  no  non-endpoint  vertices  are  common  to  these  paths. 

Given  a  path  p  in  V(Q),  with  endpoints  in  V,  the  path  may  be  decom¬ 
posed  into  a  sequence  of  minipaths  (Sa lQ2, . . . ,  San__lCen),  from  which  we  may 
construct  a  path  (an, ....  an)  in  Q  by  replacing  each  mini-path  by  the  cor¬ 
responding  edge.  We  will  denote  this  path  by  fjfi .  Note  that  since  V(Q)  is 
a  DAG,  an©(0)(-)  =  ant v(g)(w),  and  by  definition  a  path  p.  is  m-connecting 
if  and  only  if  it  is  d-connecting.  Since  it  helps  to  make  clear  that  we  are 
referring  to  a  path  in  a  DAG,  we  will  only  use  the  term  ‘d-connecting’  when 
referring  to  a  path  which  is  m-connecting  (and  d-connecting)  in  V(Q). 

6.1.1  Graphical  properties  of  V(Q) 

Lemma  6.1  Let  Q  be  an  ancestral  graph  with  vertex  set  V. 

(i)  If  ,8  €  V  then  an v{g){8)  H  V  =  an g{P). 

(ii)  an v{g)[Sv{g))  =  V&v(Q){Sv{Q))  U  SV(g),  so  an v(g)(Sv(g))  Q  Sv(g)  U  une. 

(iii)  an v{g)(Sv{Q))  H  Lv(g)  =  0- 

Proof:  (i)  If  cx,/3  G  V  and  a  G  an v{g){8)  then  there  is  a  directed  path  <5 
from  a  to  f8  in  V{Q).  Every  non-endpoint  vertex  on  <5  has  at  least  one  par¬ 
ent  and  at  least  one  child  in  T>(Q),  hence  every  vertex  on  d  is  in  V  (since 
ch v(Q)(Sv{g))  =  0  =  p&v^(Lx>(g)))-  It-  then  follows  from  the  construction  of 


T>(Q)  that  6  exists  in  Q.  so  a  €  an g(B).  It  also  follows  from  the  construction 
of  V(Q)  that  any  directed  path  in  Q  exists  in  V(Q). 

(ii)  By  construction,  pa v{g){0ap)  —  {a. /3}  Q  ung  (by  Lemma  3.7).  But 
again,  by  construction,  pa^^^ung)  =  0.  Hence  an v{g){^ap)  —  {a,/?, crag}  C 
ung  U  {crap},  so  an v{g)(Sv(g))  Q  ung  U  Sv(g)- 

(iii)  This  follows  from  the  previous  property: 

an v(g)(Sv(g))  H  Lt>(g)  C  (ung  U  S-p(g))  H  Lv{g)  C  (V  U  Sv(g))  D  L-p(g)  =  0 


□ 

Note  that  ant g(/3)  ^  ant©(g)(/5)  for  j3  €  V,  because  an  undirected  edge 
a  —  (3  in  Q  is  replaced  by  a  -4  \ap  t—  j8  in  T>(Q). 

Lemma  6.2  Q  is  a  subgraph  of  V(Q)[SL^f. 

Proof:  First  recall  that  amp(g)(-)  =  ant-p(g)(-)  since  T>(Q)  is  a  DAG.  We  now 
consider  each  of  the  edges  occurring  in  Q: 

(i)  If  a  —  /3  in  Q  then  a  -4  aaj3  4-  /3  in  T>(Q),  so  a,  f3  <E  ant v(Q)(SV{g))-  It 
then  follows  that  a  —  f8  in  V(Q)fI8((eg \- 

(ii)  If  a  — y  fi  in  Q  then  a  -4  j3  in  T>(Q),  so  a  €  ant-p(g)(,8)-  By  Lemma 

6. 1  (i) ,  (3  ^  ant x>(e) (a),  and  since  further,  8  £  SV(g)  U  ung,  by  Lemma 
6.1(ii),  8  ant-p(g)(5i)(g)).  It  then  follows  from  the  definition  of  the 

transformation  that  a  -4  6  in  V{Q)f^g8 

(iii)  Likewise,  if  a  44  t3  in  Q  then  a  <—  Xap  -4  8  in  T>(Q).  By  Lemma  6.1(1) 
and  (ii),  it  follows  as  in  case  (ii)  that  8  ^  anbp(g)({a}  U  Sj>(g)),  and  by 
symmetry,  a  £  ant v(Q){{8]  U  Sv{q))>  Hence  a  44  0  in 

□ 
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6.2  The  independence  model  0m(T,(^)[^)j ) 

Theorem  6.3  If  Q  is  an  ancestral  graph  then 


im(G)  =  %mG))tvz  =  )• 


©(e) 


It  follows  from  this  result  that  the  global  Markov  property  for  ancestral 
graphs  may  be  reduced  to  that  for  DAGs:  X  is  m-separated  from  Y  given  Z 
in  Q  if  and  only  if  X  is  d-separated  from  Y  given  Z  U  5d(g)-  (However,  see 
Section  8.6  for  related  comments  concerning  parametrization.) 

It  also  follows  from  this  result  that  the  class  of  independence  models 
associated  with  ancestral  graphs  is  the  smallest  class  that  contains  the  DAG 
independence  models  and  is  closed  under  marginalizing  and  conditioning. 

We  break  the  proof  into  three  pieces: 

Proof:  3m(V(g))[Z%))  =  3mWG)[%%\  )  by  Theorem  4.18.  □ 

Proof:  3m(g)  C  3m(V(g))(%» 

Suppose  Q  has  vertex  set  V,  containing  vertices  a,  /?,  and  set  Z  (a,  fi  £  Z). 
It  is  sufficient  to  prove  that  if  there  is  a  path  /z  which  d-connects  a  and  3 
given  Z  U  Sv(&)  in  V{Q)  then  pb5  m-connects  a  and  3  given  Z  in  Q. 

Suppose  that  7  is  a  collider  on  ns .  In  this  case  7  is  a  collider  on  //  since 
the  corresponding  minipaths  collide  at  7  in  V{Q).  Since  pi  is  d-connecting 
given  Z  U  S-p(g)  and  76  V, 

7  €  ^an v(g){Z  U  S-p(g))j  D  V  =  ^an7 o(g)(Z)  D  Pj  U  (&nv(g){Sv(g))  G  V^j . 

by  Proposition  2.1.  But  7  ^  ung,  so  by  Lemma  6.1  (ii),  7  £  an v(g)(Sv(g))- 
Hence  7  €  (a nx>(g)(Z)  (~1  V’)  =  an g(Z),  the  equality  following  from  Lemma 
6.1(i). 

If  7  is  a  non-collider  on  pfi  then  7  is  a  non-collider  on  pi,  so  7  £  Z  U  SV(g), 
thus  7  ^  Z  as  required.  □ 


Proof:  3m(V(g)[Tv%\  )  C  3m(g) 

By  Lemma  6.2  g  is  a  subgraph  of  V(g)[L^sf,  the  result  then  follows  by 
Proposition  3.12.  □ 


6.2.1  If  Q  is  maximal  then  X>(0)£' =  £ 

We  now  prove  the  result  mentioned  at  the  start  of  this  section: 
Theorem  6.4  If  Q  is  a  maximal  ancestral  graph  then 

vmZ%]=G- 


Proof:  By  Lemma  6.2  Q  is  a  subgraph  of  P>{G)[Zfg'p  while  by  Theorem  6.3 
these  graphs  correspond  to  the  same  independence  model.  It  then  follows 
from  the  maximality  of  Q  that  P(G)[lZe)  =  Q-  □ 

7  Probability  Distributions 

In  this  section  we  relate  the  operations  of  marginalizing  and  conditioning 
that  have  been  defined  for  independence  models  and  graphs  to  probability 
distributions. 

7.1  Marginalizing  and  Conditioning  Distributions 

For  a  graph  Q  with  vertex  set  V  we  consider  collections  of  random  variables 
(Xv)„€v  taking  values  in  probability  spaces  (5C)„ey.  In  all  the  examples 
we  consider,  the  probability  spaces  are  either  real  finite-dimensional  vector 
spaces  or  finite  discrete  sets.  For  A  C  V  we  let  XA  =  xveA{X„),  X  =  Xv 
and  XA  =  (Xv)veA. 

If  P  is  a  probability  measure  on  Xy  then  as  usual  we  define  the  distri¬ 
bution  after  marginalizing  over  Xl ,  here  denoted  P[x  or  PxVXL,  to  be  a 
probability  measure  on  Xv\l,  such  that 

P[Xl(E)  =  Pxvxl(E )  =  p({ xV\^xL)  e  E  X  XL) 

We  will  assume  the  existence  of  a  regular  conditional  probability  measure, 
denoted  p[xs=*s(.)  or  p(.  |  xs  =  xs)-  for  all  x$  €  Xs  so  that 

f  P[Xs=*s(E)dPXs(xs )  =  P[{XV\S,XS)  6£xf). 

This  defines  P[Xs^*s(-)  up  to  almost  sure  equivalence  under  PXs .  Likewise 
we  define 

p[xxrs(-)  =  (p[xs=xs)[XL(-y 
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7.2  The  Set  of  Distributions  Obeying  an  Independence 
Model  (V(3)) 

We  define  conditional  independence  under  P  as  follows: 


AALB  |  C  [P] 


P{ 


Xc-xc 

XV\(AVC) 


(•)  = 


p!xB=*B<xC=xC 

'-XV\(AUBUC) 


(’) 


( Pxbuc 


where  we  have  used  the  usual  shorthand  notation:  A  denotes  both  a  vertex 
set  and  the  random  variable  Xa- 

For  an  independence  model  3  over  V  let  V( 3)  be  the  set  of  distributions 
P  on  £  such  that  for  arbitrary  disjoint  sets  A,  B,  Z,  {Z  may  be  empty) 


if  (A,  B\Z)  €3  then  AALB  |  Z\P). 


Note  that  if  P  6  V{3)  then  there  may  be  independence  relations  that  are 
not  in  3  that  also  hold  in  P. 

A  distribution  P  is  said  to  be  faithful  or  Markov  perfect  with  respect  to 
an  independence  model  3  if 


(.4,  B  |  Z)  £  3  if  and  only  if  AALB  \  Z  [P], 

An  independence  model  3  is  said  to  be  probabilistic  if  there  is  a  distribution 
P  that  is  faithful  to  3. 


7.3  Relating  V{3m(0))  and  Ppm(a[“)) 

Theorem  7.1  Let  3  be  an  independence  model  over  V  with  SOL  C  V.  If 
P  e  V{3)  then 

P[*f~  £  P(3g)  (. Pxs  «.«.). 

Proof:  Suppose  { X ,  Y  |  Z)  e  3{SL.  It  follows  that  (X,  Y  |  Z  U  S)  €  3  and 
(X  U  Y  U  Z)  C  V  \  (5  U  L).  Hence,  if  P  e  V{3)  and  {X,  Y  \  Z)  e  3[SL  then 

XJLLF  |  Z  U  S  [P], 

hence 
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(The  last  step  follows  from  the  assumption  that  regular  conditional  proba¬ 
bility  measures  exist.  See  Koster  (1999a)  Appendix  A  &  B.)  Since  there  are 
finitely  many  triples  ( X ,  Y  \  Z)  G  3[^,  it  follows  that 

Pl*rscnX)  ( Pxsa.e.), 

as  required.  □ 

Two  corollaries  follow  from  this  result: 


Corollary  7.2  If  Q  is  an  ancestral  graph  and  P  6  V(3m(G))  then 

PQr  e  PCWCOD  =  rPWffO)  (Pxsa.e.). 
Proof:  This  follows  directly  from  Theorem  7.1  and  Theorem  4.18.  □ 


Corollary  7.3  If  N  is  a  normal  distribution,  faithful  to  an  independence 
model  3  over  vertex  set  V  then  N[^=1‘  is  faithful  to  3[SL. 


Proof:  Since  N  €  V{3),  by  normality  and  Theorem  7.1,  Aj*®  e  P(3[®). 
Now  suppose  (X,  Y  \  Z)  0  3[SL  where  XuYuZ  C  V  \  (S'  U  L).  Hence 
( X ,  Y  |  Z  U  S)  3.  Since  N  is  faithful  to  3, 


XfLY  |  Z  U  S  [iV]  which  implies 


XfYY  |  Z 


N 


■Xs=xs 


for  any  Xs  €  M  5!,  by  standard  properties  of  the  normal  distribution.  □ 

Note  that  the  analogous  result  is  not  true  for  the  multinomial  distribution 
as  context-specific  (or  asymmetric)  independence  relations  may  be  present. 


(i) 


(ii) 


Figure  16: 


An  ancestral  graph  Q:  (ii)  the  graph  Q['l%  . 


(See  Section  7.3.1.) 
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7.3.1  A  non-independence  restriction 

The  following  example  due  to  Verma  and  Pearl  (1991),  Robins  (1997)  shows 
that  there  are  distributions  Q  E  V(2m(G)[sL )  for  which  there  is  no  distribution 
P  E  V(2rn(G))  such  that  Q  —  P[SL.  In  other  words,  a  set  of  distributions 
defined  via  a  set  of  independence  relations  may  impose  constraints  on  a  given 
margin  that  are  not  independence  relations. 

Consider  the  graph  G  in  Figure  16(i).  Marginalizing  over  ip  produces 
the  complete  graph  shown  in  Figure  16(ii).  so  'P(3m{G[*w))  is  the  sat¬ 
urated  model  containing  every  distribution  over  [a,  ,5, 7,  <5}.  However,  if 
P  E  V(3m(G))  then,  almost  surely  under  P(Xa,Xy), 


P(XS  |  xa,xp,x7)dP(xp  |  xa) 


/  P(XS  |  xa,  x3,  xy,  Xy)  dP{x^,  |  xa,  X0,  Xy)  dP(xp  \  xa) 


Xj3  j  XtJj 


fX0  JX% 


'  XpXX^p 


P(Xs  I  Xa,Xp,Xy,Xy)  dP{x^  ]  Xa,Xp)dP(Xp  I  xa) 

since  n/ALip  |  {a,  /?} 

P  ( X g  j  x& ,  Xp ,  Xy*  x^p )  dP(xp)  Xfp  j  Xq  j 


/  P(XS  I  xa,  Xy,  Xy)  dP(xp,  %  I  xa) 

J  XfiXXtj, 

P( Xg  \  xa,Xy,xi,)dP{xi)  |  xa) 


since 

/3ilh  |  {a,  7,  ip} 


x* 


P(Xg  |  Xy,xy)  dP(xj>) 


since  alLip, 
aAL5  |  {7 ,ip}. 


This  will  not  hold  in  general  for  an  arbitrary  distribution  since  the  last  ex¬ 
pression  is  not  a  function  of  xa.  However,  faithfulness  is  preserved  under 
marginalization  for  arbitrary  distributions. 
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7.4  Independence  Models  for  Ancestral  Graphs  Are 
Probabilistic 

The  existence  of  distributions  that  are  faithful  to  3m(G)  for  an  ancestral 
graph  Q  follows  from  the  corresponding  result  for  DAGs: 

Theorem  7.4  (Building  on  results  of  Geiger,  1990,  Geiger  and  Pearl,  1990, 
Frydenberg,  1990b,  Spirtes  et  al.,  1993  and  Meek,  1995b.) 

For  an  arbitrary  DAG,  V,  3m(T>)  is  probabilistic,  in  particular  there  is  a 
normal  distribution  that  is  faithful  to  3m(T>). 

Theorem  7.5  If  G  is  an  ancestral  graph  then  3m(G)  is  probabilistic,  in  par¬ 
ticular  there  is  a  normal  distribution  which  is  faithful  to  3m(G). 

Proof:  By  Theorem  6.3  there  is  a  DAG  V{Q)  such  that 

ue)  =  )■ 

By  Theorem  7.4  there  is  a  normal  distribution  N  that  is  faithful  to  3m(T>(Q)). 
By  Corollary  7.3,  N[xxs=*s  is  faithful  to  3m(V(Q))[Z%\  =  3m(V(G){Z{sg\)  = 

3m{g).  □ 


7.4.1  Completeness  of  the  global  Markov  property 

A  graphical  separation  criterion  C  is  said  to  be  complete  if  for  any  graph  G 
and  independence  model  3*, 

if  3C(G)  C  3*  and  V{3 C(G))  =  V(T)  then  3C{G)  =  3*. 

In  other  words,  the  independence  model  3c (G)  (see  Section  2.1.1)  cannot  be 
extended  without  changing  the  associated  set  of  distributions  P(3c(G))- 

Theorem  7.6  The  global  Markov  property  for  ancestral  graphs  is  complete. 

Proof:  The  existence  of  a  distribution  that  is  faithful  to  3rn{G)  is  clearly  a 
sufficient  condition  for  completeness.  □ 
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8  Gaussian  Parametrization 


There  is  a  natural  parametrization  of  the  set  of  all  non-singular  normal  dis¬ 
tributions  satisfying  the  independence  relations  in  3m(Q).  In  the  following 
sections  we  first  introduce  the  parametrization,  then  define  the  set  of  normal 
distributions  satisfying  the  relations  in  the  independence  model,  and  then 
prove  equivalence. 

Let  Np(fit  E)  denote  a  p-dimensional  multivariate  normal  distribution 
with  mean  /u  and  covariance  matrix  E.  Likewise  let  ffp  be  the  set  of  all 
such  distributions,  with  non-singular  covariance  matrices. 

Throughout  this  section  we  find  it  useful  to  make  the  following  convention: 
Ei^  =  (Eaa)_1>  where  E aa  is  the  submatrix  of  E  restricted  to  A. 

8.1  Parametrization 

A  Gaussian  parametrization  of  an  ancestral  graph  Q,  with  vertex  set  V  and 
edge  set  E  is  a  pair  {/i,  $),  consisting  of  a  mean  function 

H  :  V  — >  R 

which  assigns  a  number  to  every  vertex,  together  with  a  covariance  function 

$  :  V  U  E  R 

which  assigns  a  number  to  every  edge  and  vertex  in  Q,  subject  to  the  re¬ 
striction  that  the  matrices  A,  Q  defined  below  are  positive  definite  (p.d.): 


Q,/36un^ 


f  $(a)  if  a  =  /3, 

\ap  =  ^  $(a  — /3)  if  a  —  /3  in 

[  0  otherwise; 


(Q)ajg  — 

a,p€V\  ting 


$(a)  if  a  =  3, 

<f>(a:  6)  if  a  -H-  /?  in  Q , 

0  otherwise. 


Let  &(G)  be  the  set  of  all  such  parametrizations  (//.  <&)  for  Q.  We  further 
define: 
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(  1  if  a  =  8 

(B)as  —  bap  =  <  <f>(a  4—  8)  if  a  4-  j8  in  Q 

a,pev  I  o  otherwise. 


Proposition  8.1  If  A  and  Q  are  given  by  a  parametrization  of  Q  then 

(i)  A,  Q  are  symmetric; 

(ii)  for  v  E  ung,  A„„  >  0  and  for  v  E  V  \  ung,  a ipu  >  0. 

Proof:  Both  properties  follow  from  the  requirement  that  A,  fi  be  positive 
definite.  □ 

Proposition  8.2  Let  Q  be  an  ancestral  graph  with  vertices  V,  edges  E.  The 
values  taken  by  <&(•)  on  the  sets,  ung  U  {a  —  j8  E  E},  (V  \  ung)  U  {a  /3  E 
E}  and  {a  -*  (3  E  E}  are  variation  independent  as  <!>(•)  varies  in  <&((?). 
Likewise,  p(-)  and  <&(•)  are  variation  independent. 

Proof:  Follows  directly  from  the  definition  of  a  parametrization.  □ 

Lemma  8.3  Let  Q  be  an  ancestral  graph  with  vertex  set  V.  Further,  let  -< 
be  an  arbitrary  ordering  of  V  such  that  all  vertices  in  une  precede  those  in 
V  \  ung,  and  a  E  an (8)  \  {8}  implies  a  ~<  /3.  Under  such  an  ordering,  the 
matrix  B  given  by  a  parametrization  of  Q  has  the  form: 

B  =  ( Lu  bm  )  ’  and  B~' =  (  -B^Biu  b-j  )  ’ 

where  B,m  is  lower  triangular ,  with  diagonal  entries  equal  to  1.  Hence  B  is 
lower  triangular  and  non-singular,  as  is  B~l . 

Note  that  we  use  u,  d  as  abbreviations  for  ung,  V  \  ung  respectively. 

Proof:  If  a,  $  E  ung  then  since  Q  is  ancestral,  a  ch g(0)  and  vice  versa. 
Hence  by  definition  of  B ,  ba, 3  =  <5 [a,  8)  (where  S  is  Kronecker’s  delta  func¬ 
tion).  If  a  E  un g,  /3  E  V  \  u%  then  a  0  ch g(,8),  since  Q  is  ancestral,  hence 
bag  =  0.  If  a,  8  E  V  \  ung.  and  a  =  8  then  bap  —  1  by  definition.  If 
a  i=-  8,  and  bap  /  0  then  a  E  dig  (13).  so  / 3  -<  a.  Finally,  since  Q  is  ancestral, 
8  i t  chg(aj,  so  bpa  =  0  as  required.  □ 
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8.1.1  Definition  of  the  Gaussian  model  (N{Q)) 

A  parametrization  {p,  <f>)  of  Q  specifies  a  Gaussian  distribution  as  follows: 


A rgM<i>  —  A  \  Eg$) 

where 

(M)q  =  p(a),  and  =  B~l  (  AQ  *  °  j  P~T.  (1) 

The  Gaussian  model,  M{Q)  associated  with  Q  is  the  set  of  normal  distribu¬ 
tions  obtained  from  parametrizations  of  Q\ 


MS)  =  {  A |  e  $(£)} 

The  mean  function  p  does  not  play  a  significant  role  in  what  follows. 


Lemma  8.4  If  {p,  <f>)  is  a  parametrization  of  an  ancestral  graph  Q  then 

L  — 1  A  — 1  dT  d  — T 

-1 


E, 


£<£> 


A" 

-1  r 

y dd  0du* 


-A 


A->  B-dpBlul\-'B]u+a)B-J 


y-i 


A  +  Bjun^Bdu  B]uQ~lBdd 


B]dtrlBdu 


Bjd^  ’  Bdd 


Proof:  Immediate  from  the  definition  of  Eg$  and  Lemma  8.3.  □ 

Note  that  it  follows  from  the  conditions  on  B,  A  and  Q,  that  Eg$  is 
positive  definite.  The  mean  function  p  does  not  play  a  significant  role  in 
what  follows. 


8.1.2  Parametrization  of  a  subgraph 

Lemma  8.5  Let  (p,  $)  be  a  parametrization  of  an  ancestral  graph  Q  — 
(V,E).  If  A  C  V  such  that  ant(A)  =  A,  and  (pa,$a)  is  the  parametrization 
of  the  induced  subgraph  Ga>  obtained  by  restricting  p  to  A  and  $  to  A  U  E* , 
where  E*  is  the  set  of  edges  in  Q  \ ,  then 

A^1  =  (A  ^/inixyinu)  Rt  —  (Q)AndAnd,  BAl  =  (B~1)aa-, 


hence 


"-‘Ga'Sa  ~  (^g^)aa. 


where  Aa,LIa,  Ba  are  the  matrices  associated  with  <f?.4 . 
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In  words,  if  all  vertices  that  are  anterior  to  a  set  A  in  Q  are  contained 
in  A  then  the  covariance  matrix  parametrized  by  the  restriction  of  $  to  the 
induced  subgraph  QA  is  just  the  sub-matrix  (Eq^aa- 

Note  the  distinction  between  matrices Jndexed  by  two  subsets  which  in¬ 
dicate  submatrices  in  the  usual  way  (e.g.  T,Aa)  and  matrices  indexed  by  one 
subset  which  are  obtained  from  a  parametrization  of  an  induced  subgraph 
on  this  set  of  vertices  (e.g.  BA). 

Proof:  For  Q  there  is  nothing  to  prove.  Since  A  —  ant(T),  no  vertex  in 
ung  fl  A  is  adjacent  to  a  vertex  in  ung  \  A.  Thus 

Y  __  (  A-anu  Arm  0 

\  0  Au\,4  u\A 

so  (A^AnuAnu  =  Ajj1  as  required. 

Since  A  is  anterior, 

Baa  0  \ 

Baa  &aa  J 

where  A  =  V  \  A.  The  result  then  follows  by  partitioned  inversion  since 
Ba  —  ( B)aa  =  Baa.  □ 

If  Q  —  ( V ,  E)  is  a  subgraph  of  an  ancestral  graph  Q*  =  ( V ,  E*),  then  there 
is  a  natural  mapping  (p,  <f>)  h*  (//*,$*)  from  &(Q)  to  <&((?*),  defined  by: 


T(x)  if  x  e  V  U  E, 
0  if  x  E  E*  \  E. 


<f>*  simply  assigns  0  to  edges  in  Q*  that  are  not  in  Q  (both  graphs  have  the 
same  vertex  set).  It  is  simple  to  see  that 


Ng^  —  A 

The  next  Proposition  is  an  immediate  consequence: 


Proposition  8.6  If  Q  =  (V,E)  is  a  subgraph  of  an  ancestral  graph  Q *  = 

{V,E*)  then  Af(Q)  C  Ar(Q*). 


8.1.3  Interpretation  of  parameters 

Theorem  8.7  If  Q  —  ( V,E )  is  an  ancestral  graph,  (p,  <E>)  G  and 

E  =  Eg<j>,  then  for  all  vertices  a  for  which  pa(a)  =£  0, 

^  =  -£{«)P>«.)2»-i„)P.M;  /“rtAer,  (  A“‘  “  )  =  BSB\  (2) 

Regarding  E  as  the  covariance  matrix  for  a  (normal)  random  vector  Xy, 
the  Theorem  states  that  <E>(o  <—  if)  is  —  1  times  the  coefficient  of  Xv  in  the 
regression  of  Xa  on  Xp.ja) .  Cl  is  the  covariance  matrix  of  the  residuals  from 
this  set  of  regressions.  A  is  just  the  inverse  covariance  matrix  for  Xang .  Hence 
if  E  is  obtained  from  some  unknown  covariance  function  <f>  for  an  ancestral 
graph  Q ,  then  equation  (2)  allows  us  to  reconstruct  $  from  G  and  E. 

Proof:  Suppose  that  E  =  E^  for  some  parametrization  (//,$).  If  every 
vertex  has  no  parents  then  B  is  the  identity  matrix  and  the  claim  holds 
trivially. 

Suppose  that  a  is  a  vertex  with  pa(a)  0,  hence  by  definition  a  G  V\ung. 
Let  A  =  ant  (a),  e  =  ant  (a)  \  (pa(o;)  U  {a}),  p  =  pa(a).  By  Lemma  8.5, 

SUa  =  b;1  (  Y  (3) 

Since  G  is  ancestral,  neg(a)  (1.4  =  0.  Thus  partitioning  A  into  e,p,  {a},  we 
obtain 


(  Bee 

0 

fiend  end, 

fiend  pnd 

0 

\ 

Ba  = 

Bpe 

Bpp 

0 

,  and  Cl  a  = 

flpnd end 

flpnd  pnd 

0 

{  0 

Bap 

l) 

{  0 

0 

O'aa 

) 

The  expression  for  B{Q}pa(a)  =  Bap  then  follows  from  (3)  by  routine  calcula¬ 
tion.  The  second  claim  is  an  immediate  consequence  of  (1).  □ 

8.1.4  Identifiability 

Corollary  8.8  If  G  is  an  ancestral  graph ,  dq.dq  are  two  covariance  func¬ 
tions  for  G  and  Eg#,  =  E then  dq(-)  =  d>2(-).  Hence  the  mapping 
d>  Eg^>  is  one-to-one. 

Proof:  This  follows  directly  from  Theorem  8.7:  both  dq  and  d>2  satisfy 
equation  (2)  and  hence  are  identical.  □ 
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8.1.5  A '(G)  for  a  complete  ancestral  graph  is  saturated 
Theorem  8.9  If  Q  =  (  V.  E)  is  a  complete  ancestral  graph  then  A '(G)  = 

AY  • 

r 


In  words,  a  complete  ancestral  graph  parametrizes  the  saturated  Gaussian 
model  of  dimension  |V'|. 

Proof:  Let  E  be  an  arbitrary  p.d.  matrix  of  dimension  \V\.  It  is  sufficient  to 
show  that  there  exists  a  covariance  function  $  for  G,  such  that  E  =  Eg$.  We 
may  apply  equation  (2)  to  obtain  matrices  B,  A  and  0  from  E.  However,  it 
still  remains  to  show  that  (a)  whenever  there  is  a  non-zero  off-diagonal  entry 
in  A,  f l  or  B,  there  is  an  edge  of  the  appropriate  type  in  Q  to  associate  with 
it,  and  (b)  A  and  Q  are  positive  definite. 

By  Lemma  3.21(h),  Gung  is  complete,  hence  in  A  all  off-diagonal  entries 
are  permitted  to  be  non-zero. 

It  follows  directly  from  the  construction  of  B  given  by  (2)  that  if  ( B)aji  ^ 
0  and  a  ^  f3  then  8  G  pa(a). 

Now  suppose,  a.  / 3  G  V  \  uncj,  and  there  is  no  edge  a  /?  in  Q.  Since  Q 
is  complete,  it  follows  from  Lemma  3.21  (iii)  that  either  a  f3,  or  a  -»  ,/L 
Without  loss  of  generality  suppose  the  former,  and  let  A  =  ant(a)  =  pa(o;)  U 
{a}  since  Q  is  complete.  Then 

(BY,BT)ap  =  {B  aYj  aaB~a)  ap 

J3pp^  0  \  E pot  \  (  Bjp  —  E“pE poi 

-EQpEppX  1  )  ^  Eap  Eaa  J  \  0  1 

=  0 

as  required.  The  same  argument  applies  in  the  casejwhere  3  G  ung,  a  G 
V\  ung.  and  hence  a  <—  8,  thus  establishing  that  BUB  is  block-diagonal 
with  blocks  A-1  and  fi.  This  establishes  (a). 

Since,  by  hypothesis,  E  is  p.d.  and  B  is  non-singular,  by  construction,  it 
follows  that  A  and  Q  are  also  p.d.  hence  (b)  holds.  We  now  have: 

J  B~  =  B^BIIB^B^  =  E. 


(  A_i  0 
V  0  Q 


Ea*  =  B 


8.1.6  Entries  in  Cl  1  and  Q,  . 

If  Q  =  (V.E)  is  an  ancestral  graph  then  we  define  Q ^  to  be  the  induced 
subgraph  with  vertex  set  V,  but  including  only  the  bi-directed  edges  in  E. 

Lemma  8.10  If  a,  3  €  V\  mig  and  a  is  not  adjacent  to  3  in  (G++)a  then 

(Q_1)QrJ  =  0. 

for  any  Q  obtained  from  a  covariance  function  $  for  Q. 

Proof:  (Based  on  the  proof  of  Lemma  3.1.6  in  Koster  (1999a).) 

First  recall  that  a  and  ,3  are  adjacent  in  (G^)a  if  and  only  if  a  and  8  are 
collider  connected  in  Q <_>.  The  proof  is  by  induction  on  |c/|  =  \V  \  ung|. 

If  \d\  =  2  then  (fM1).^  =  —  (f2)Q/3|Q|_1  =  0  as  there  is  no  edge  a  j3  in  Q. 

For  \d\  >  2.  note  that  by  partitioned  inversion: 

(II  )afi  =  —  ~  H{q}cHcc  I^c{/3} )  |I^{a,/3}.cl 

—  “  (  ~  wc*7  ( I I<;c  )rr$Msp  j  |H{a,/3}.c 

\  7  ,sec  / 

where  c  =  d\{a,  3}.  Cljf  —  (flcc)  7  and 

~i{a,0}.c.  =  I^{a,p}  {a, 8}  H{q,/3}c  Glcc  G,.{o.  ;}  • 

Since  a  and  ,3  are  not  adjacent  in  (G^>)a  there  is  no  edge  a  8  in  G:  hence 
oja0  =  0.  Now  consider  each  term  in  the  sum  (5).  If  there  is  no  edge  a  <-»  7 
or  no  edge  S  0  3  then  wa7  (fl"1)  uigp  =  0.  If  there  are  edges  a  «-»  7  and 
S  o  8  in  G  then  7  ^  <5  as  otherwise  a  and  ,3  would  be  collider  connected  in 
(?«.,  and  further  7  and  S  are  not  collider  connected  in  (Gc)<->-  Hence  by  the 
inductive  hypothesis.  (Q".1)^-  =  0.  Thus  every  term  in  the  sum  is  zero  and 
we  are  done.  C 

An  alternative  proof  follows  from  the  Markov  properties  of  undirected 
graphical  Gaussian  models  (see  Lauritzen,  1996):  view  the  specification  of  Cl 
formally  as  if  it  were  an  inverse  covariance  matrix  for  a  model  represented  by 
an  undirected  graph  U.  Then  0.  and  3  are  not  collider  connected  in  Q  if  and 
only  if  a  and  3  are  not  connected  in  U,  Hence  by  the  global  Markov  property 


(4) 

(5) 
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for  undirected  graphs,  a  and  3  are  marginally  independent,  so  (f2-1)a/j  =  0. 
(We  thank  S.  Lauritzen  for  this  observation.) 

It  also  follows  directly  from  the  previous  Lemma  (and  this  discussion)  that 
fi-1  will  be  block  diagonal.  (WTe  thank  N.  Wermuth  for  this  observation.) 

Corollary  8.11  Let  Q  be  an  ancestral  graph  with  a  <r¥  j3  in  Q.  Let  Q'  be 
the  subgraph  formed  by  removing  the  a  -H-  6  edge  in  Q .  If  a  and  13  are  not 
adjacent  in  (£?(_> )a  then 

(fi  1)a/3  =  — $(0;  -H-  /?)|fl{Q)(S}.c| 

where  c  =  d\{a,  ,8},  &  is  a  covariance  function  for  Q ,  audit  is  the  associated 
matrix. 

Note  that  we  adopt  the  convention:  fi{aus}.c  =  when  c  =  0. 

Proof:  By  the  argument  used  in  the  proof  of  Lemma  8.10,  it  is  clear  that  the 
sum  in  equation  (5)  is  equal  to  0.  The  result  then  follows  since,  by  definition, 
=  $(0!  /?)•  □ 


8.2  Gaussian  Independence  Models 


A  Gaussian  independence  model,  A/"(3),  is  the  set  of  non-singular  normal 
distributions  obeying  the  independence  relations  in  3: 

Af(3)  =  M\v\  n  V{T) 

where  V  is  the  set  of  vertices  in  3.  As  noted  in  Section  7,  normal  distributions 
in  Ar(3)  may  also  satisfy  other  independence  relations. 

Proposition  8.12  If  Q'  is  a  subgraph  of  Q  then  Jf \3m(G'))  C  J\f{3m(G))- 

Proof:  Follows  directly  from  Proposition  3.12.  □ 


Theorem  8.13  If  Qi,§2  are  two  ancestral  graphs  then 

Ar{3m{Gi))  =  j\r{3m{G2))  if  and  only  if3m(Gi)  =  3m(£2). 

Proof:  If  3m(Gi)  -  3m(£2),  then  Af(3m(Gi))  =  Ar(3m(^2)j  by  definition. 

By  Theorem  7.5  there  is  a  normal  distribution  Ni  that  is  faithful  to 
3m(t?  1).  Hence 


(A,  B\Z)€  3rn(Gi)  «=»  A1LB  I  Z  [Ni]. 

Since  J\f(3m(Qi))  =  A'/'(3rn(^2)))  Nt  €  A/'(3m(^2)),  hence  3TO(£2) 
The  reverse  inclusion  may  be  argued  symmetrically. 
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8.3  Equivalence  of  Gaussian  Parametrizations  and  In¬ 
dependence  Models  for  Maximal  Ancestral  Graphs 

The  main  result  of  this  section  is  the  following: 

Theorem  8.14  If  Q  is  a  maximal  ancestral  graph  then 

A r(Q)=M(3m(G)). 

In  words,  if  Q  is  a  maximal  ancestral  graph  then  the  set  of  normal  distri¬ 
butions  that  may  be  obtained  by  parametrizing  Q  is  exactly  the  set  of  normal 
distributions  that  obey  the  independence  relations  in  3m(G)- 

Note  that  Wermuth  et  al.  (1994)  refer  to  a  ‘parametrization’  of  an  in¬ 
dependence  model  when  describing  a  parametrization  of  a  (possibly  proper) 
subset  of  jV(3).  To  distinguish  their  usage  from  the  stronger  sense  in  which 
the  term  is  used  here,  we  may  say  that  a  parametrization  is  full  if  all  distri¬ 
butions  in  Af(3)  are  parametrized.  In  these  terms  Theorem  8.14  states  that 
if  Q  is  maximal  then  the  parametrization  of  Q  described  in  Section  8.1  is  a 
full  parametrization  of  Af(3m(G))- 

8.3.1  N(G)  when  G  is  not  maximal 

If  G  is  not  maximal  then  S/(G)  is  a.  proper  subset  of  N{3m{G)),  as  the  follow¬ 
ing  example  illustrates:  consider  the  non-maximal  ancestral  graph  Q  shown 
in  Figure  9(a).  Since  3m(G)  =  0,  ■M’(3m(G))  —  A/4,  the  saturated  model. 
However,  there  are  10  free  parameters  in  A/4  and  yet  there  are  only  5  edges 
and  4  vertices,  giving  9  parameters  in  A f(G)-  Direct  calculation  shows  that 

CCyQ  <7  ad  /3&I35  &7a&  af)& j)5  n 

<77<5 - - - ( - =  U 

Oaa  8/3 

where  a ^  =  (Eg$)^.  This  will  clearly  not  hold  for  all  distributions  in  A/4. 

8.3.2  If  G  is  maximal  then  M(3m(G))  C  A f(G) 

We  first  require  two  Lemmas: 

Lemma  8.15  Let  G  —  (V,E)  be  an  ancestral  graph,  e  an  edge  in  E  with 
endpoints  (a,  8)  and  V  =  mtg({a.  3}).  If  Q'  =  ( V. .  E \  {e})  is  maximal .  then 
for  an  arbitrary  covariance  function  <f>  for  G,  =  0  implies  <F(e)  =  0. 
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In  words,  if  in  a  graph  Q,  removing  an  edge,  e,  between  a  and  3  results  in 
a  graph  that  is  still  maximal,  then  in  any  distribution  Ng^  obtained  from  a 
parametrization  (ji,  <L)  of  Q .  if  the  partial  correlation  between  a  and  3  given 
V  \  {a,, 3}  is  zero,  then  T  assigns  zero  to  the  edge  e. 

Proof:  There  are  three  cases,  depending  on  the  type  of  the  edge  e: 

(1)  e  is  undirected; 

In  this  case  a.  8  €  un§.  Then  by  Lemma  8.4, 

(E^)a/3  =  (A  +  B]un~1Bdu)a!). 

However,  since  V  =  antg({a, /?}),  d  =  0,  hence  (E^1)^  =  (A)Q^  = 
$(o:  —  0),  so  <L(a  —  0)  =  0  as  required. 

(2)  e  is  directed; 

Without  loss,  suppose  a  0.  It  now  follows  from  Lemma  8.4,  that 

—  ^  j  b^a(0l  )yS^S0- 

7,i56rf 


Now,  bia  =  0  for  a  ^  7  since  chg(a)  =  0,  and  baa  —  1  by  definition. 
Hence 


5ed 

Since  3  -0  a.  3  €  antg(ct),  so  V  =  ant g(a).  Thus  if  6  6  V.  a  /  5.  and 
a  and  3  are  connected  by  a  path  7 r  in  containing  more  than  one 
edge  (see  p.60),  then  7r  is  a  primitive  inducing  path  between  a  and  <5 
in  Q.  But  this  is  a  contradiction,  since  8  €  antg(a),  and  yet  by  Lemma 
4.5  (ii),  8  0:  antg(a).  Hence  by  Lemma  8.10,  (8l~l)as  —  0  for  8  -A  a. 
Consequently, 

(E-1;^  =  (i2~l/aaba,  =  3). 

As  Q  is  positive  definite.  (LT1)^  >  0.  hence  T(«  — ?  3)  =  0. 
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(3)  e  is  bi-directed. 

Again  it  follows  from  Lemma  8.4,  that 

(£-%«  =  BJwSr'Biw 

'1,S&d 

As  che({a, /?})  =  0,  67q  =  0  for  7  a,  and  likewise  bs$  =  0  for  <5^/3. 
By  definition,  bpp  =  bQQ  =  1.  Since,  by  hypothesis,  £?'  is  maximal,  a 
and  /3  are  not  adjacent  in  (£7t>)°,  so 

)aj3  =  )a,8  =  p)\Q{a,0}.c\  > 

the  second  equality  following  by  Corollary  8.11.  Hence  $(a  <->•  /?)  =  0 
as  required.  □ 

Note  that  case  (2)  could  alternatively  have  been  proved  by  direct  appeal 
to  the  interpretation  of  <L(ct  4—  8)  as  a  regression  coefficient,  as  shown  by 
Theorem  8.7.  However,  such  a  proof  is  not  available  in  case  (3),  and  we 
believe  that  the  current  proof  provides  greater  insight  into  the  role  played  by 
the  graphical  structure. 

The  next  Lemma  provides  the  inductive  step  in  the  proof  of  the  claim 
which  follows. 

Lemma  8.16  Let  Q  =  (V,  E)  be  an  ancestral  graph  and  e  an  edge  in  E.  If 
Q'  =  (V,  E  \  {e})  is  maximal,  and  une  =  ung/,  then 

M(G)nM(3m(G'))  c  M(G'). 

Proof:  Let  N  €  N{G)  nArpm(^')),  with  covariance  matrix  E,  and  para- 
metrization  $g.  Let  e  have  endpoints  a,  II.  Since  ung  =  un<y<  it  is  sufficient 
to  show  that  $g(e)  =  0,  because  in  this  case,  the  restriction  of  <f>g  to  the 
edges  (and  vertices)  in  G'  is  a  parametrization  of  Q' ,  hence  N  6  A f(G'). 

Let  A  =  ant gi({a,8})  =  antg({or,  /?}).  Since  a,  ft  are  not  adjacent  in  G' 
and  G’  is  maximal,  it  follows  from  Corollary  5.3  that 

({a},  {/?}  |  ant  ({a,  8})  \  {a,  8})  €  3m(G')- 
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Since  Ar  €  Af(3m{G')),  it  then  follows  from  standard  properties  of  the  Normal 
distribution  that  =  0.  By  Lemma  8.5  parametrized  by  $.4. 

the  restriction  of  to  the  edges  and  vertices  in  the  induced  subgraph  QA. 
The  result  then  follows  by  applying  Lemma  8.15  to  QA,  giving  &A{e)  = 

$g(e)  =0.  □ 

We  are  now  in  a  position  to  prove  that  if  Q  is  maximal  then  all  distri¬ 
butions  in  may  be  obtained  by  parametrizing  Q.  This  constitutes 

one  half  of  Theorem  8.14: 

Claim:  If  Q  is  maximal  then  Af(3m(Gj)  C  Af(G)- 

Proof:  Suppose  N  G  Af{3m{G)).  Let  G  be  the  completed  graph  defined  in 
Section  5.2.  By  Theorem  8.9,  Af\v\  =  Af(G),  hence  N  €  Af{G)-  By  Theorem 
5.6.  there  exists  a  sequence  of  maximal  ancestral  graphs  G  =  Go,  •  •  •  ,Gr  =  G 
where  r  is  the  number  of  non-adjacent  vertices  in  G  and  un0o  =  •  •  •  =  un6r. 
Now  by  Proposition  8.12, 

Af(MGr))c---cAf(3m(G0))=Af\v\ 

hence  N  €  Af(3m(Gi)),  for  0  <  i  <  r.  We  thus  may  apply  Lemma  8.16 
r-times  to  show  successively 

N  e  M(Gi)r\M{3m{Gi+i))  implies  N  e  Af(Gw) 

for  i  =  0  to  r  —  1.  Hence  N  G  Af(Gr)  —  Af(G)  as  required.  □ 

8.3.3  A f(G)  obeys  the  global  Markov  property  for  G 

The  following  lemma  provides  a  partial  converse  to  Lemma  8.15. 

Lemma  8.17  If  $  is  a  covariance  function  for  an  ancestral  graph  Q  = 
(V.E).  and  a,/3  G  V  are  not  adjacent  in  ( G)a  then  =  0. 

Proof:  There  are  two  cases  to  consider: 

(I)  a  i.  uiig  or  5  €  un^; 

By  Lemma  8.4: 

i^gl)a3  =  Sb&3 


(6) 


If  6,,q  ^  0  and  b$p  #  0  then  there  are  edges  a  7.  3  -4  d  in  £/.  hence 
7  f:  5.  0  7^  7  and  since  otherwise  a  and  J  are  adjacent  in  (Q)a. 

Further,  there  is  no  path  between  7  and  <5  in  since  if  there  were,  a 
and  3  would  be  collider  connected  in  Q,  hence  adjacent  in  {Q)a.  Thus 
7  and  5  are  not  adjacent  in  (G^)°  and  so  by  Lemma  8,10  =  0. 

Consequently  every  term  in  the  sum  in  (6)  is  zero  as  required. 

(2)  a.  3  G  ung. 

Again  by  Lemma  8.4: 

(^c/$)a/3  3at$  +  ^  ^  b-yQ( 0  )~//jbs0.  (7) 

7,<5ed 

If  a,  3  are  not  adjacent  in  ( Q)a  then  a  and  8  are  not  adjacent  in  Q. 
Hence  \ap  —  0.  The  argument  used  in  case  (1)  may  nowr  be  repeated 
to  show  that  every  term  in  the  sum  in  (7)  is  zero.  □ 

The  next  lemma  proves  the  second  half  of  Theorem  8.14.  It  does  not 
require  Q  to  be  maximal,  so  we  state  it  as  a  separate  lemma. 

Lemma  8.18  If  Q  is  an  ancestral  graph  thenff(Q )  C  Af(3m(G)f, 

In  words,  any  normal  distribution  obtained  by  parametrizing  an  ancestral 
graph  Q  obeys  the  global  Markov  property  for  Q. 

Proof:  Suppose  that  (X,  Y  |  Z)  G  3m{G).  line  &ntg(XUYUZ)\(XuYuZ) 
then  in  (Ga.nt(xuYuZ))a  either  v  is  separated  from  X  by  Z.  or  from  Y  by  Z. 
Hence  X  and  Y  may  always  be  extended  to  X*  ,Y*  respectively,  such  that 
(X*.  Y*  |  Z)  e  3m(G )  and  X*  U  Y*  U  Z  =  antff(X  U  Y  U  Z).  Since  the 
multivariate  normal  density  is  strictly  positive,  for  an  arbitrary  N  6  Mrs 

(C5)  AALB  [CUT  and  AiLC  |  B  L  D  implies  A1LB  ZC\D 

(see  Dawid,  1980).  By  repeated  application  of  C5  it  is  sufficient  to  show  that 
for  each  pair  a.  3  with  a  G  A'*.  3  el'*, 

aAL3  (Zu  A'*  u  )"  \  {a, /?}  4V), 

or  equivalently  PC  .4^)0  5  =  0.  where  .4  =  A*  U  Y*  U  Z.  Since  <X*,Y*  1  Z)  G 
3m(Gl-  a  and  6  are  not  adjacent  in  ( QA ~f.  The  result  then  follows  from 
Lemma  8.5  and  Lemma  8.17.  O 
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Lemmas  8.17  and  8.18  are  based  on  Lemma  3.1.6  and  Theorem  3.1.8 
in  Koster  (1999a).  though  these  results  concern  a  different  class  of  graphs 
(see  Section  9.2).  An  alternative  proof  of  Lemma  8.18  for  ancestral  graphs 
without  undirected  edges  is  given  in  Spirtes  et  al.  (1996.  1998). 

8.3.4  Distributional  equivalence  of  Markov  equivalent  models 

The  following  corollary  states  that  two  maximal  ancestral  graphs  are  Markov 
equivalent  if  and  only  if  the  corresponding  Gaussian  models  are  equivalent. 

Corollary  8.19  For  maximal  ancestral  graphs,  Gx-Gt, 

3m(Gi)  =  3m (&)  */  and  only  if  M \Gi)  =  J\f  (</2). 

Proof: 

A f(6i)=Af(g2)  &  ^ran(G1))=M(3m(G2))  by  Theorem  8.14; 

^  3m(Gi)  =  3m(t?2)  by  Theorem  8.13. 

□ 

Corollary  8.20  If  G  —  (V)  E)  is  an  ancestral  graph  and  SUL  C  V, 
if  N  E  M{G)  then  N[xxsl=xs  E  Ff{G[SL). 

for  all  x$  E  . 

Proof:  By  Lemma  8.18  N(G)  C  M{3m(G))-  Hence  by  normality  and  Theo¬ 
rem  7.1  N[*sl=xs  E  j\f (3m(G)[sL) ■  Finally,  by  Corollary  4.19  G[SL  is  maximal, 
hence 

Wln(G)0  =M(3m(GO)  =M(G[SL), 

by  Theorem  4.18  and  Theorem  8.14.  □ 

Suppose  that  we  postulate  a  Gaussian  model  Af(G)  with  complex  struc¬ 
ture,  such  as  a  DAG  containing  latent  variables  and/or  selection  variables. 
This  corollary  is  significant  because  it  guarantees  that  if  ff(G)  contains  the 
True’  distribution  AM  and  we  then  simplify  Q  to  a  model  for  the  observed 
variables,  Ar(^[/):  then  the  new  model  will  contain  the  true  ’observable’ 
distribution  obtained  by  marginalizing  the  unobserved  variables  and  condi¬ 
tioning  on  the  selection  variables,  AT/T=''S.  (The  distribution  A'[/T=IS  is 
termed  ‘observable’  because  it  is  the  distribution  over  the  observed  variables 
(V  (S  U  L))  in  the  'selected*  subpopulation  for  which  AT  =  x$.  In  general 
this  will  obviously  not  be  the  distribution  observed  in  a  finite  sample.) 


8.4  Gaussian  Ancestral  Graph  Models  Are  Curved  Ex¬ 
ponential  Families 

Let  S  be  a  full  regular  exponential  family  of  dimension  m  with  natural  pa¬ 
rameter  space  0  C  Rm.  so  S  —  { Pg  \  6  €  0).  If  l  is  an  open  neighbourhood 
in  0,  then  Sl  =  {P$  \  9  €  U}.  Let  S0  be  a  sub-family  of  S,  with  0O  the 
corresponding  subset  of  0. 

If  .4  is  open  in  Rm  then  a  function  /  :  A  — >  Rm  is  a  diffeomorphism  of 
4  onto  f(A)  if  /(•)  is  one-to-one,  smooth  (infinitely  differentiable),  and  of 
full  rank  everywhere  on  A.  Corollary  A. 3  in  Kass  and  Vos  (1997)  states  that 
a  function  /  is  a  diffeomorphism  if  it  is  smooth,  one-to-one,  and  the  inverse 
f-1  :  f(A)  — >  A  is  also  smooth. 

Theorem  4.2.1  in  Kass  and  Vos  (1997)  states  that  a  subfamily  <S0  of  an 
m-dimensional  regular  exponential  family  S  is  a  locally  parametrized  curved 
exponential  family  of  dimension  k  if  for  each  90  e  0o  there  is  an  open  neigh¬ 
bourhood  U  in  0  containing  90  and  a  diffeomorphism  /:[/—>•  Rk  x  Wn~k , 
and 

S"  =  {PeeSu  \f(9)  =  (il>,0)}. 

We  use  the  following  fact  in  the  next  Lemma: 

Proposition  8.21  If  f  is  a  rational  function  defined  everywhere  on  a  set  D 
then  /(”)  is  a  rational  function  defined  everywhere  on  D. 

Proof:  The  proof  is  by  induction  on  n.  Suppose  /(”)  =  gnjhn,  where  gn,  hn 
are  polynomials,  and  hn  >  0  on  D.  Then  f(-n+ll  =  ( hng'n  -  gnh'n)/h2n  from 
which  the  conclusion  follows  (since  >  0  on  D).  □ 

Let  Sj^|  denote  the  cone  of  positive  definite  |V1  x  |V'|  matrices. 

Lemma  8.22  If  Q  is  a  complete  ancestral  graph  then  the  mapping 

fg  :  $(G)  R'V  x  £+  given  by  <p.  4>)  (fj,,  Teq>) 

is  a  diffeomorphism  from  <f>(Q)  to  R'  xS]v 

Proof:  Corollary  8.8  establishes  that  fg  is  one-to-one.  Further,  by  Theorem 
8.9,  A '(G)  =A\-  hence 


=  sr  xs;. 
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It  remains  to  show  that  fg,  ff 1  are  smooth.  It  follows  from  equation  (1) 
that  the  components  of  fg  are  rational  functions  of  {/i.$},  defined  for  all 
(n,  T)  £  $(£/).  Hence,  by  Proposition  8.21,  fg  is  smooth.  Similarly,  equation 
(2)  establishes  that  ffl  is  smooth.  □ 

Theorem  8.23  For  an  ancestral  graph  Q(V.E),  A f(Q)  is  a  curved  exponen¬ 
tial  family,  with  dimension  2  •  \V\  +  |jv|. 

Proof:  This  follows  from  the  definition  ofAf{Q),  the  existence  of  a  complete 
ancestral  supergraph  of  Q  (Lemma  5.5),  Lemma  8.22  and  Theorem  4.2.1  of 
Kass  and  Vos  (1997),  referred  to  above.  □ 

The  BIC  criterion  for  the  model  Af{G)  is  given  by 

BIC(0)  =  —2 In Lg{0)  +  ln(n)(2  •  \V\  +  |£|), 

where  n  is  the  sample  size,  Lg(-)  is  the  likelihood  function  and  6  is  the  cor¬ 
responding  MLE  for  Af(G).  A  consequence  of  Theorem  8.23  is  that  BIC(-)  is 
an  asymptotically  consistent  criterion  for  selecting  among  Gaussian  ancestral 
graph  models  (see  Haughton,  1988). 

By  contrast,  Geiger  et  al.  (2001)  have  shown  that  simple  discrete  DAG 
models  with  latent  variables  do  not  form  curved  exponential  families. 

8.5  Parametrization  via  Recursive  Equations  with  Cor¬ 
related  Errors 

The  Gaussian  model  Af(G)  can  alternatively  be  parametrized  in  two  pieces 
via  the  factorization  of  the  density: 


f(xv)  =  /(.rUIi(.)/(./V  I  X\me)  (8) 

The  undirected  component  fung  may  be  parametrized  via  an  undirected 
graphical  Gaussian  model  also  known  as  a  covariance  selection  model 
(see  Lauritzen,  1996,  Dempster.  1972). 

The  directed  component,  fix r  ur<s  |  iung);  may  be  parametrized  via  a 
set  of  recursive  equations  as  follows: 
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(i)  Associate  with  each  v  in  V  \  ung  a  linear  equation,  expressing  Xu 
as  a  linear  function  of  the  variables  for  the  parents  of  v  plus  an 
error  term: 


irgpa(y) 


(ii)  Specify  a  non-singular  multivariate  normal  distribution  over  the 
error  variables  (e^)^ev\ung  (with  mean  zero)  satisfying  the  condi¬ 
tion  that 

if  there  is  no  edge  a  -H-  3  in  Q.  then  Cov(ea,  t$)  =  0, 
but  otherwise  unrestricted. 

Note  that  b*aj3  =  -bap  under  the  parametrization  specified  in  Section  8.1. 
The  conditional  distribution.  f(xy\uns  \  xurig),  is  thus  parametrized  via  a 
simultaneous  equation  model,  of  the  kind  used  in  econometrics  and  psycho¬ 
metrics  since  the  1940’s.  We  describe  the  system  as  ‘recursive’  because  the 
equations  may  be  arranged  in  upper  triangular  form,  possibly  with  correlated 
errors.  (Note  that  some  authors  only  use  this  term  if,  in  addition,  the  errors 
are  uncorrelated.)  As  shown  in  Theorem  8.7  the  set  of  recursive  equations 
described  here  also  has  the  special  property  that  the  linear  coefficients  may 
be  consistently  estimated  via  regression  of  each  variable  on  its  parents.  This 
does  not  hold  for  recursive  equations  in  general. 

8.5.1  Estimation  procedures 

The  parametrization  described  above  thus  breaks  N(G)  into  an  undirected 
graphical  Gaussian  model  and  a  set  of  recursive  equations  with  correlated  er¬ 
rors.  This  result  is  important  for  the  purposes  of  statistical  inference  because 
software  packages  exist  for  estimating  these  models:  M 1 M  (Edwards,  1995) 
fits  undirected  Gaussian  models  via  the  IPS  algorithm:  AMOS  (Arbuckle. 
1997),  EQS  (Bentler,  1986),  Proc  CALIS  (SAS  Publishing,  1995)  and  LIS- 
REL  (Joreskog  and  Sorbom,  1995)  are  packages  which  fit  structural  equation 
models  via  numerical  optimization.  Fitting  the  two  components  separately 
is  possible  in  view  of  the  factorization  of  the  likelihood  given  by  equation 
(8)  and  the  variation  independence  of  the  parameters  in  these  pieces  (see 
Proposition  8.2). 
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It  should  be  noted  that  the  equations  used  in  the  parametrization  above 
are  a  very  special  (and  simple)  subclass  of  the  much  more  general  class  of 
models  that  structural  equation  modelling  packages  can  fit  e.g.  they  only 
contain  observed  variables.  This  motivates  the  future  development  of  special 
purpose  fitting  procedures. 

8.5.2  Path  diagrams 

Path  diagrams,  introduced  by  Sewall  Wright  (1921,  1934),  contain  directed 
and  bi-directed  edges,  but  no  undirected  edges,  and  are  used  to  represent 
structural  equations  in  exactly  the  way  described  in  (i)  and  (ii)  above.  Hence 
we  have  the  following: 

Proposition  8.24  If  Q  is  an  ancestral  graph  containing  no  undirected  edges 
then  Af(Q)  is  the  model  obtained  by  regarding  Q  as  a  path  diagram. 

Further  results  relating  path  diagrams  and  graphical  models  are  described 
in  Spirtes  et  al.  (1998),  Koster  (1999a, b,  1996)  and  Spirtes  (1995).  The  rela¬ 
tionship  between  Gaussian  ancestral  graph  models  and  Seemingly  Unrelated 
Regression  (SUR)  models  (Zellner,  1962)  is  discussed  in  Richardson  et  al. 
(1999). 


8.6  Canonical  DAGs  Do  Not  Provide  a  Full  Parametriza¬ 
tion 

It  was  proved  in  Section  6  that  the  canonical  DAG  V{Q)  provides  a  way  of 
reducing  the  global  Markov  property  for  ancestral  graphs  to  that  of  DAGs. 

It  is  thus  natural  to  consider  whether  the  associated  Gaussian  independence 
model  could  be  parametrized  via  the  usual  parametrization  of  this  DAG.  In 
fact,  this  does  not  parametrize  all  distributions  in  A r(3m(G))  as  shown  in  the 
following  example: 

Consider  the  ancestral  graph  Qx,  and  the  associated  canonical  DAG, 
T>(Qi)  shown  in  Figure  17(i-a)  and  (i-b).  Since  3m(Gi)  =  0,  =  A 3 

the  saturated  model  on  3  variables.  How-ever,  if  N  is  a  distribution  given  by 
a  parametrization  of  T>(Qi),  then  it  follows  by  direct  calculation  that 

min  {flab- Pbc;  Pac}  <  -7= 

M2 
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Figure  17:  (i-a)  An  ancestral  graph  Q i;  (i-b)  the  corresponding  canonical 
DAG,  V(Qi)\  (ii-a)  an  ancestral  graph  Q2\  (ii-b)  the  canonical  DAG,  XX- 


where  pvw  is  the  correlation  between  Xv  and  Xw  (see  Spirtes  et  ah,  1998). 
Since  this  does  not  hold  for  all  distributions  in  A/3,  there  are  normal  distri¬ 
butions  N  E  J\f{3m(Gi))  for  which  there  is  no  distribution  N*  E  M(V(Gi)) 
such  that  N  =  N*  L  , 

Lauritzen  (1998)  p.12  gives  an  analogous  example  for  conditioning,  by 
considering  the  graph  Q2,  with  canonical  DAG,  T>(Q2),  shown  in  Figure  17(ii- 
a)  and  (ii-b).  Lauritzen  shows  that  there  are  normal  distributions  N  E 
J\f (3mX)),  f°r  which  there  is  no  distribution  N*  E  J\f(T>(G2))  such  that 

These  negative  results  are  perhaps  surprising  given  the  very  simple  nature 
of  the  structure  in  V(Q),  but  serve  to  illustrate  the  complexity  of  the  sets  of 
distributions  represented  by  such  models. 

9  Relation  to  Other  Work 

The  problem  of  constructing  graphical  representations  for  the  independence 
structure  of  DAGs  under  marginalizing  and  conditioning  was  originally  posed 
by  N.  Wermuth  in  1994  in  a  lecture  at  GMU.  Wermuth.  Cox  and  Pearl  devel¬ 
oped  an  approach  to  this  problem  based  on  summary  graphs  (see  Wermuth 
et  al..  1994,  1999,  Cox  and  Wermuth,  1996,  Wermuth  and  Cox,  2000).  More 
recently  J.  Koster  has  introduced  another  class  of  graphs,  called  MC- graphs , 
together  with  an  operation  of  marginalizing  and  conditioning.  (See  Koster, 
2000,  Koster,  1999a.  Koster,  1999b.) 

In  Figure  18  we  show  two  examples  of  data  generating  processes,  together 
with  the  maximal  ancestral  graph,  summary  graph  and  MC-graphs  resulting 
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after  marginalizing  (i)  and  conditioning  (ii). 

Simple  representations  for  DAGs  under  marginalization  alone  were  pro¬ 
posed  by  Verma  (1993),  who  defined  an  operation  of  projection  which  trans¬ 
forms  a  DAG  with  latent  variables  to  another  DAG  in  which  each  latent 
variable  has  exactly  two  children  both  of  which  are  observed  (called  a  ‘semi- 
Markovian  model’).  The  operation  is  defined  so  that  the  DAG  and  its  projec¬ 
tion  are  Markov  equivalent  over  the  common  set  of  observed  variables.  This 
approach  does  not  lead  to  a  full  parametrization  of  the  independence  model 
for  the  reasons  discussed  in  Section  8.6. 

In  this  section  we  will  briefly  describe  the  classes  of  summary  graphs  and 
MC-graphs.  We  then  outline  the  main  differences  and  similarities  to  the 
class  of  maximal  ancestral  graphs.  Finally  we  discuss  the  relation  between 
ancestral  graphs  and  chain  graphs. 

9.1  Summary  Graphs 

A  summary  graph  is  a  graph  containing  three  types  of  edge  — , 
Directed  cycles  may  not  occur  in  a  summary  graph,  but  it  is  possible  for 
there  to  be  a  dashed  line  {a--- 8)  and  at  the  same  time  a  directed  path 
from  a  to  /3.  Thus  there  may  be  two  edges  between  a  pair  of  vertices,  e.g. 
a  uy  [3.  This  is  the  only  combination  of  multiple  edges  that  is  permitted. 
The  separation  criterion  for  summary  graphs  is  equivalent  to  m-separation 
after  substituting  bi-directed  edges  (o)  for  dashed  edges  ( ---- ). 

Wermuth  et  al.  (1999)  presents  an  algorithm  for  transforming  a  summary 
graph  so  as  to  represent  the  independence  structure  remaining  among  the 
variables  after  marginalizing  and  conditioning.  This  procedure  will  not,  in 
general,  produce  a  graph  that  obeys  a  pairwise  Markov  property,  hence  there 
may  be  a  pair  of  vertices  a ,  8  that  are  not  adjacent  and  yet  there  is  no  subset 
Z  of  the  remaining  vertices  for  which  the  model  implies  aAL(3  |  Z.  The  graph 
in  Figure  18(i-c)  illustrates  this.  There  is  no  edge  between  a  and  c,  and  yet 
ajLc  and  aj/Lc  |  b.  This  example  also  illustrates  that  there  may  be  more 
edges  than  pairs  of  adjacent  vertices  in  a  summary  graph. 

Wermuth  and  Cox  (2000)  present  a  new  method  for  constructing  a  sum¬ 
mary  graph  based  on  applying  ‘swreep’  operators  to  matrices  whose  entries 
indicate  the  presence  or  absence  of  edges.  Kauermann  (1996)  analyses  the 
subset  of  summary  graphs  that  only  involve  dashed  edges,  which  are  also 
known  as  covariance  graphs. 
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(i-a)  (i-b)  (i-c) 


(i-d) 
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(ii-a)  (ii-b)  (ii-c)  (ii-d) 


Figure  18:  (i-a)  a  DAG  generating  process  V i;  (i-b)  the  ancestral  graph 
T>\f{l  h};  the  summary  graph  (i-c)  and  MC-graph  (i-d)  resulting  from 
marginalizing  lx,l2  hi  Vx.  (ii-a)  a  DAG  generating  process  V2]  (ii-b)  the 
ancestral  graph  the  summary  graph  (ii-c)  and  MC-graph  (ii-d)  result¬ 

ing  from  conditioning  on  s  in  V2. 

9.2  MC-Graphs 

Koster  (1999a, b)  considers  MC-graphs,  which  include  the  three  edge  types 
t-»,  but  in  addition  may  also  contain  undirected  self-loops  (see  vertex 
b  in  Figure  18 (ii-d)).  Up  to  four  edges  may  be  present  between  a  pair  of 
vertices,  e.g. 


The  global  Markov  property  used  for  MC-graphs  is  identical  to  the  m- 
separation  criterion  (Koster  names  the  criterion  :d-separation:  because  it  is 
a  natural  generalization  of  the  criterion  for  DAGs).  Koster  presents  a  pro¬ 
cedure  for  transforming  the  graph  under  marginalizing  and  conditioning.  As 
with  the  summary  graph  procedure  the  transformed  graph  will  not  generally 
obey  a  pair-wise  Markov  property,  and  may  have  more  edges  than  there  are 
pairs  of  vertices. 

9.3  Comparison  of  Approaches 

The  three  classes  of  graphs:  ancestral  graphs,  summary  graphs  and  MC- 
graphs  have  been  developed  with  similar  goals  in  mind,  hence  it  is  not  sur- 
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prising  that  in  certain  respects  they  are  similar.  However,  there  are  also  a 
number  of  differences  between  the  approaches. 

For  the  rest  of  this  section  we  will  ignore  the  notational  distinction  be¬ 
tween  dashed  lines  (----)  and  bi-directed  edges  (0)  by  treating  them  as  if 
they  were  the  same  symbol. 

9.3.1  Graphical  and  Markov  structure 

The  following  (strict)  inclusions  relate  the  classes  of  graphs: 

maximal  ancestral  C  ancestral  C  summary  C  MC. 

Essentially  the  same  separation  criterion  is  used  for  ancestral  graphs,  sum¬ 
mary  graphs  and  MC-graphs.  Further,  defining  ![•]  to  denote  a  class  of 
independence  models,  we  have: 

I  [maximal  ancestral]  =  I  [ancestral]  =  I  [summary]  C  I[MC]. 

The  first  equality  is  Theorem  5.1,  the  second  equality  follows  by  a  con¬ 
struction  similar  to  the  canonical  DAG  (Section  6).  The  last  inclusion  is 
strict  because  MC-graphs  include  directed  cyclic  graphs  which,  in  general, 
are  not  Markov  equivalent  to  any  DAG  under  marginalization  and  condition¬ 
ing  (Richardson,  1996).  In  addition,  there  are  MC-graphs  which  cannot  be 
obtained  by  applying  the  marginalizing  and  conditioning  transformation  to  a 
graph  containing  only  directed  edges:  Figure  19  gives  an  example.  Thus  the 
class  of  MC-graphs  is  larger  than  required  for  representing  directed  graphs 
under  marginalizing  and  conditioning.  The  direct  analogues  to  Theorems  6.3 
and  6.4  do  not  hold. 

In  the  summary  graph  formed  by  the  procedures  described  in  Wermuth 
et  al.  (1999),  Wermuth  and  Cox  (2000),  the  configurations  —7----  and 
—  ye-  never  occur.  This  is  equivalent  to  condition  (ii)  in  the  definition 
of  an  ancestral  graph.  Consequently,  as  noted  by  Wermuth  et  al.  (1999)  a 
decomposition  of  the  type  shown  in  Figure  4  is  possible  for  summary  graphs. 
However,  though  directed  cycles  do  not  occur  in  summary  graphs,  the  ana¬ 
logue  to  condition  (i)  does  not  hold,  since  it  is  possible  to  have  an  edge  a  8 
and  a  directed  path  from  a  to  8. 

The  marginalizing  and  conditioning  transformation  operations  for  sum¬ 
mary  graphs  and  MC-graphs  are  ‘local’  in  that  they  make  changes  to  triples  of 
adjacent  vertices.  In  contrast  the  transformation  Q  1-4  QfL  requires  pairwise 
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Figure  19:  An  MC-graph  which  cannot  be  obtained  by  applying  the  marginal¬ 
izing  and  conditioning  transformation  given  by  Koster  (2000)  to  a  graph 
which  contains  only  directed  edges.  Further,  the  independence  model  corre¬ 
sponding  to  this  MC-graph  cannot  be  obtained  by  marginalizing  and  condi¬ 
tioning  an  independence  model  represented  by  a  directed  graph. 


tests  of  m-separation  to  be  carried  out  in  order  to  determine  the  adjacencies 
present  in  Q[SL.  This  may  make  the  transformation  harder  for  a  human  to 
carry  out.  On  the  other  hand  the  transformation  given  by  Wermuth  is  recur¬ 
sive,  and  tests  for  the  existence  of  an  m-connecting  path  can  be  performed 
by  a  recursive  procedure  that  only  examines  triples  of  adjacent  vertices.  It 
can  be  said  that  the  MC-graph  and  summary  graph  transformations  may  in 
general  be  performed  in  fewer  steps  than  the  ancestral  graph  transformation. 

However,  a  price  is  paid  for  not  performing  these  tests  of  m-separation: 
whereas  Q[SL  always  obeys  a  pairwise  Markov  property  (Corollary  4.19),  the 
summary  graphs  and  MC  graphs  resulting  from  the  transformations  do  not 
do  so  in  general.  This  is  a  disadvantage  in  a  visual  representation  of  an 
independence  model  insofar  as  it  conflicts  with  the  intuition,  based  on  sepa¬ 
ration  in  undirected  graphs,  that  if  two  vertices  are  not  connected  by  an  edge 
then  they  are  not  directly  connected  and  hence  may  be  made  independent  by 
conditioning  on  an  appropriate  subset  of  the  other  vertices. 

9.3.2  Gaussian  parametrization 

For  summary  graphs,  as  for  ancestral  graphs,  the  Gaussian  parametrization 
consists  of  a  conditional  distribution  and  a  marginal  distribution.  Once  again, 
the  marginal  parametrization  is  specified  via  a  covariance  selection  model  and 
the  conditional  distribution  via  a  system  of  structural  equations  of  the  type 
used  in  econometrics  and  psychometrics  as  described  in  Section  8.5  (see  Cox 
and  Wermuth,  1996).  Under  this  parametrization  one  parameter  is  associated 
with  each  edge  and  vertex  in  the  graph. 

As  described  above,  it  is  possible  for  a  summary  graph  to  contain  more 
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edges  than  there  are  pairs  of  adjacent  vertices.  Consequently,  the  Gaussian 
model  associated  with  a  summary  graph  will  not  be  identified  in  general,  and 
the  analogous  result  to  Corollary  8.8  will  not  hold.  Thus  the  summary  graph 
model  will  sometimes  contain  more  parameters  than  needed  to  parametrize 
the  corresponding  Gaussian  independence  model. 

On  the  other  hand,  as  mentioned  in  the  previous  section,  summary  graphs 
do  not  satisfy  a  pairwise  Markov  property,  and  hence  the  associated  model 
will  not  parametrize  all  Gaussian  distributions  satisfying  the  Markov  prop¬ 
erty  for  the  graph.  In  particular,  the  comments  concerning  non-maximal 
ancestral  graphs  apply  to  summary  graphs  (see  Section  8.3.1).  In  other 
words,  parametrization  of  a  summary  graph  does  not,  in  general,  lead  to  a 
full  parametrization  of  the  independence  model  (see  Theorem  8.14).  In  this 
sense  the  summary  graph  model  sometimes  contains  too  few  parameters. 

As  a  consequence,  two  Markov  equivalent  summary  graphs  may  represent 
different  sets  of  Gaussian  distributions,  so  the  analogue  to  Corollary  8.19  does 
not  hold.  Thus  for  the  purpose  of  parametrizing  Gaussian  independence 
models,  the  class  of  maximal  ancestral  graphs  has  advantages  over  summary 
graphs  (and  non-maximal  ancestral  graphs). 

It  should  be  stressed,  however,  that  the  fact  that  a  summary  graph  model 
may  impose  additional  non-Markovian  restrictions  can  be  seen  as  an  advan¬ 
tage  insofar  as  it  may  lead  to  more  parsimonious  models.  For  this  purpose 
ideally  one  would  wish  to  develop  a  graphical  criterion  that  would  also  allow 
the  non-independence  restrictions  to  be  read  from  the  graph.  In  addition, 
one  would  need  to  show  that  the  analogue  to  Corollary  8.20  held  for  the 
transformation  operation,  so  that  any  non-Markovian  restrictions  imposed 
by  the  model  associated  with  the  transformed  summary  graph  were  also  im¬ 
posed  by  the  original  model.  Otherwise  there  is  the  possibility  that  while 
the  original  model  contained  the  true  population  distribution,  by  introducing 
an  additional  non-Markovian  constraint,  the  model  after  transformation  no 
longer  contains  the  true  distribution.  The  approach  in  Wermuth  and  Cox 
(2000)  considers  the  parametrization  as  derived  from  the  original  DAG  in 
the  manner  of  structural  equation  models  with  latent  variables.  Under  this 
scheme  the  same  summary  graph  may  have  different  parametrizations.  An 
advantage  of  this  scheme  is  that  the  strengths  of  the  associations  may  be 
calculated  if  we  know  the  parameters  of  the  generating  DAG. 

Finally,  note  that  the  linear  coefficients  occurring  in  the  equations  in  a 
summary  graph  model  do  not  always  have  a  population  interpretation  as 
regression  coefficients.  This  is  because  there  may  be  an  edge  a ----6  and  a 

77 


directed  path  from  a  to  0.  (However,  coefficients  associated  with  edges  v  — >  6 
where  v  is  a  vertex  in  the  undirected  subgraph  do  have  this  interpretation, 
as  noted  by  Wermuth  and  Cox  (2000).)  Hence  the  analogue  to  Theorem  8.7 
does  not  hold  for  all  summary  graphs. 

Koster  (1999a, b)  does  not  discuss  parameterization  of  MC-graphs,  how¬ 
ever  all  of  the  above  comments  will  apply  to  any  parametrization  which 
associates  one  parameter  with  each  vertex  and  edge.  Indeed,  under  such 
a  scheme  identifiability  will  be  more  problematic  than  for  summary  graphs 
because  MC-graphs  permit  more  edges  between  vertices  in  addition  to  self¬ 
loops. 

9.4  Chain  Graphs 

A  mixed  graph  containing  no  partially  directed  cycles,  and  no  bi-directed 
edges  is  called  a  chain  graph.  (Recall  that  a  partially  directed  cycle  is  an 
anterior  path  from  a  to  6,  together  with  an  edge  0  —>  a.)  There  is  an 
extensive  body  of  work  on  chain  graphs.  (See  Lauritzen  (1996)  for  a  review.) 

As  was  shown  in  Lemma  3.2(c)  an  ancestral  graph  does  not  contain  par¬ 
tially  directed  cycles,  hence  we  have  the  following: 

Proposition  9.1  If  Q  is  an  ancestral  graph  containing  no  bi-directed  edges 
then  Q  is  a  chain  graph. 

In  fact,  it  is  easy  to  see  that  the  set  of  ancestral  chain  graphs  are  the 
recursive  ‘causal’  graphs  introduced  by  Kiiveri  et  al.  (1984);  see  also  Lauritzen 
and  Richardson  (2002)  and  Richardson  (2001). 

Two  different  global  Markov  properties  have  been  proposed  for  chain 
graphs.  Lauritzen  and  Wermuth  (1989)  and  Frydenberg  (1990a)  proposed 
the  first  Markov  property  for  chain  graphs.  More  recently  Andersson  et  al. 
(2001,  1996)  have  proposed  an  alternative  Markov  property.  We  will  denote 
the  resulting  independence  models  3lwf{G)  and  3amp(G)  respectively. 

The  m-separation  criterion  as  applied  to  chain  graphs  produces  yet  an¬ 
other  Markov  property.  (This  observation  is  also  made  by  Koster  (1999a).)  In 
general  all  three  properties  will  be  different,  as  illustrated  by  the  chain  graph 
in  Figure  20(i),  Under  both  the  AMP  and  LWF  properties  aALb  in  CG -L, 
but  this  does  not  hold  under  m-separation  because  the  path  a ■  — x  —  y  4—  b 
m-connects  a  and  b  given  the  empty  set.  The  AMP  property  implies  aJLLy, 
while  this  is  not  implied  by  m-separation  or  the  LWF  property.  Note  that 
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under  m-separation  this  chain  graph  is  Markov  equivalent  to  an  undirected 
graph. 

p  c  d 

I  1  1 

q —  r—s  u*-*v*— w 

(ii)  (iii) 


Figure  20:  Chain  graphs  that  are  not  Markov  equivalent  to  any  ancestral 
graph  under  (i)  the  LWF  property,  (ii)  the  AMP  property;  (iii)  an  ancestral 
graph  for  which  there  is  no  Markov  equivalent  chain  graph  (under  either 
Markov  property). 

However,  if  we  restrict  our  attention  to  ancestral  graphs  then  we  have  the 
following  proposition: 

Proposition  9.2  If  Q  is  an  ancestral  graph  which  is  also  a  chain  graph  then 

%n  (0)  =  3 LWf{G )  =  HaMp{G)- 

This  proposition  is  an  immediate  consequence  of  clause  (i)  in  the  defi¬ 
nition  of  an  ancestral  graph  which  implies  that  there  are  no  immoralities, 
flags  or  bi-flags  in  an  ancestral  mixed  graph.  (See  Frvdenberg  (1990a)  and 
Andersson  et  al.  (1996)  for  the  relevant  definitions.) 

Finally,  note  that  under  both  the  LWF  and  AMP  Markov  properties  there 
exist  chain  graphs  that  are  not  Markov  equivalent  to  any  ancestral  graph. 
Examples  are  shown  in  Figure  20(i) , (ii) .  It  follows  that  these  Markov  models 
could  not  have  arisen  from  any  DAG  generating  process.  (See  Lauritzen  and 
Richardson  (2002)  and  Richardson  (1998)  for  further  discussion.)  Conversely, 
Figure  20(iii)  shows  an  example  of  an  independence  model  represented  by  an 
ancestral  graph  that  is  not  Markov  equivalent  to  any  chain  graph  (under 
either  chain  graph  Markov  property). 


10  Discussion 

In  this  paper  we  have  introduced  the  class  of  ancestral  graph  Markov  models. 
The  purpose  in  introducing  this  class  was  to  be  able  to  characterize  the 
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Markov  structure  of  a  DAG  model  under  marginalizing  and  conditioning.  To 
this  end  we  defined  a  graphical  transformation  Q  t-»  Q[SL,  which  corresponded 
to  marginalizing  and  conditioning  the  corresponding  independence  model 
(Theorem  4.18). 

If  a  DAG  model  containing  latent  or  selection  variables  is  hypothesized 
as  the  generating  mechanism  for  a  given  system  then  this  transformation  will 
allow  a  simple  representation  of  the  Markov  model  induced  on  the  observed 
variables. 

However,  often  graphical  models  are  used  for  exploratory  data  analysis, 
where  little  is  known  about  the  generating  structure.  In  such  situations  the 
existence  of  this  transformation  provides  a  guarantee:  if  the  data  were  gen¬ 
erated  by  an  unknown  DAG  containing  hidden  variables  then  we  are  ensured 
that  there  exists  an  ancestral  graph  which  can  represent  the  resulting  Markov 
structure  over  the  observed  variables.  Hence  the  problem  of  additional  and 
misleading  edges  encountered  in  the  introduction  may  be  avoided.  In  this 
context  the  transformation  provides  a  justification  for  using  the  class  of  an¬ 
cestral  graphs. 

However,  any  interpretation  of  the  types  of  edge  present  in  an  ances¬ 
tral  graph  which  was  arrived  at  via  an  exploratory  analysis  should  take  into 
account  that  there  may  exist  (many)  different  graphs  that  are  Markov  equiv¬ 
alent.  Spirtes  and  Richardson  (1997)  present  a  polynomial-time  algorithm 
for  testing  Markov  equivalence  of  two  ancestral  graphs.  Spirtes  et  al.  (1995, 
1999)  describe  an  algorithm  for  inferring  structural  features  that  are  com¬ 
mon  to  all  maximal  ancestral  graphs  in  a  Markov  equivalence  class.  For 
instance,  there  are  Markov  equivalence  classes  in  which  every  member  con¬ 
tains  a  directed  path  from  some  vertex  a  to  a  second  vertex  /3;  likewise  in 
other  Markov  equivalence  classes  no  member  contains  a  directed  path  from  a 
to  f3.  At  the  time  of  writing  there  is  not  yet  a  full  characterization  of  common 
features,  such  as  exists  for  DAG  Markov  equivalence  classes  (see  Andersson 
et  al.,  1997,  Meek,  1995a). 

Finally,  we  showed  that  maximal  ancestral  graphs  lead  to  a  natural  para- 
metrization  of  the  set  of  Gaussian  distributions  obeying  the  global  Markov 
property  for  the  graph.  Conditions  for  the  existence  and  uniqueness  of  max¬ 
imum  likelihood  estimates  for  these  models  is  currently  an  open  question. 

Development  of  a  parametrization  for  discrete  distributions  is  another 
area  of  current  research.  Richardson  (2002)  describes  a  local  Markov  prop¬ 
erty  for  a  class  of  graphs  that  includes  all  ancestral  graphs  without  undirected 
edges.  This  local  Markov  property  is  equivalent  to  the  global  Markov  prop- 
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erty,  and  may  thus  facilitate  the  development  of  a  discrete  parametrization. 
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A  Appendix 


A.l  Definition  of  a  Mixed  Graph 


Let  £  =  {  —  ,  4-,  4->},  be  the  set  of  edges.  Let  fp(£)  denote  the  power  set 

of  £.  Formally,  a  mixed  graph  Q  =  ( V,E )  is  an  ordered  pair  consisting  of 
a  finite  set  V,  and  a  mapping  E  :  V  x  V  — *  ip(£),  subject  to  the  following 
restrictions: 

E(a,  a)  =  0 


—  G  E(a,  8)  — v 

4 —  G  E(a,  0)  4 — > 

4 -»  G  E(a,  8)  <=> 


G  E(0a) 
-4  G  E(0,  a) 
4-4  G  E(0,  a). 
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The  induced  subgraph,  Ga  of  Q  on  A  C  V,  is  (A,  E\A)  where  E\A  is  the  natural 
restriction  of  E  to  A  x  A. 
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